Skip to main content
placeholder image

Efficient clustering of structured documents using graph self-organizing maps

Chapter


Abstract


  • Graph Self-Organizing Maps (GraphSOMs) are a new concept in the processing of structured objects using machine learning methods. The GraphSOM is a generalization of the Self-Organizing Maps for Structured Domain (SOM-SD) which had been shown to be a capable unsupervised machine learning method for some types of graph structured information. An application of the SOM-SD to document mining tasks as part of an international competition: Initiative for the Evaluation of XML Retrieval (INEX), on the clustering of XML formatted documents was conducted, and the method subsequently won the competition in 2005 and 2006 respectively. This paper applies the GraphSOM to the clustering of a larger dataset in the INEX competition 2007. The results are compared with those obtained when utilizing the more traditional SOM-SD approach. Experimental results show that (1) the GraphSOM is computationally more efficient than the SOM-SD, (2) the performances of both approaches on the larger dataset in INEX 2007 are not competitive when compared with those obtained by other participants of the competition using other approaches, and, (3) different structural representation of the same dataset can influence the performance of the proposed GraphSOM technique.

Publication Date


  • 2008

Citation


  • Hagenbuchner, M., Tsoi, A., Sperduti, A. & Kc, M. (2008). Efficient clustering of structured documents using graph self-organizing maps. In N. Fuhr (Eds.), Comparative Evaluation of XML Information Retrieval Systems (pp. 207-221). Berlin: Springer.

Scopus Eid


  • 2-s2.0-51849165077

Ro Metadata Url


  • http://ro.uow.edu.au/infopapers/3161

Book Title


  • Comparative Evaluation of XML Information Retrieval Systems

Start Page


  • 207

End Page


  • 221

Abstract


  • Graph Self-Organizing Maps (GraphSOMs) are a new concept in the processing of structured objects using machine learning methods. The GraphSOM is a generalization of the Self-Organizing Maps for Structured Domain (SOM-SD) which had been shown to be a capable unsupervised machine learning method for some types of graph structured information. An application of the SOM-SD to document mining tasks as part of an international competition: Initiative for the Evaluation of XML Retrieval (INEX), on the clustering of XML formatted documents was conducted, and the method subsequently won the competition in 2005 and 2006 respectively. This paper applies the GraphSOM to the clustering of a larger dataset in the INEX competition 2007. The results are compared with those obtained when utilizing the more traditional SOM-SD approach. Experimental results show that (1) the GraphSOM is computationally more efficient than the SOM-SD, (2) the performances of both approaches on the larger dataset in INEX 2007 are not competitive when compared with those obtained by other participants of the competition using other approaches, and, (3) different structural representation of the same dataset can influence the performance of the proposed GraphSOM technique.

Publication Date


  • 2008

Citation


  • Hagenbuchner, M., Tsoi, A., Sperduti, A. & Kc, M. (2008). Efficient clustering of structured documents using graph self-organizing maps. In N. Fuhr (Eds.), Comparative Evaluation of XML Information Retrieval Systems (pp. 207-221). Berlin: Springer.

Scopus Eid


  • 2-s2.0-51849165077

Ro Metadata Url


  • http://ro.uow.edu.au/infopapers/3161

Book Title


  • Comparative Evaluation of XML Information Retrieval Systems

Start Page


  • 207

End Page


  • 221