Skip to main content
placeholder image

XML Document Mining using Contextual Self-Organizing Maps for Structures

Conference Paper


Abstract


  • XML is becoming increasingly popular as a language for representing many types of electronic documents. The consequence of the strict structural document description via XML is that a relatively new task in mining documents based on structural and/or content information has emerged. In this paper we investigate (1) the suitability of new unsupervised machine learning methods for the clustering task of XML documents, and (2) the importance of contextual information for the same task. These tasks are part of an international competition on XML clustering and categorization (INEX 2006). It will be shown that the proposed approaches provide a suitable tool for the clustering of structured data as they yield the best results in the international INEX 2006 competition on clustering of XML data.

UOW Authors


  •   Kc, Milly W. (external author)
  •   Hagenbuchner, M.
  •   Tsoi, Ah Chung
  •   Scarselli, Franco (external author)
  •   Gori, Marco (external author)
  •   Sperduti, Alessandro (external author)

Publication Date


  • 2007

Citation


  • Kc, M., Hagenbuchner, M., Tsoi, A., Scarselli, F., Gori, M. & Sperduti, A. (2007). XML Document Mining using Contextual Self-Organizing Maps for Structures. INitiative for the Evaluation of XML Retrieval Lecture Notes in Computer Science: Springer-Verlag Berlin Heidelberg.

Abstract


  • XML is becoming increasingly popular as a language for representing many types of electronic documents. The consequence of the strict structural document description via XML is that a relatively new task in mining documents based on structural and/or content information has emerged. In this paper we investigate (1) the suitability of new unsupervised machine learning methods for the clustering task of XML documents, and (2) the importance of contextual information for the same task. These tasks are part of an international competition on XML clustering and categorization (INEX 2006). It will be shown that the proposed approaches provide a suitable tool for the clustering of structured data as they yield the best results in the international INEX 2006 competition on clustering of XML data.

UOW Authors


  •   Kc, Milly W. (external author)
  •   Hagenbuchner, M.
  •   Tsoi, Ah Chung
  •   Scarselli, Franco (external author)
  •   Gori, Marco (external author)
  •   Sperduti, Alessandro (external author)

Publication Date


  • 2007

Citation


  • Kc, M., Hagenbuchner, M., Tsoi, A., Scarselli, F., Gori, M. & Sperduti, A. (2007). XML Document Mining using Contextual Self-Organizing Maps for Structures. INitiative for the Evaluation of XML Retrieval Lecture Notes in Computer Science: Springer-Verlag Berlin Heidelberg.