Skip to main content
placeholder image

Efficient Clustering of Structured Documents Using Graph Self-Organizing Maps

Conference Paper


Abstract


  • Graph Self-Organizing Maps (GSOMs) are a new concept in the processing of structured objects. These structured objects are described by graphs, e.g. acyclic directed graphs, cyclic graphs, un-directed graphs, etc. Graphs are generalizations of the more common vectors, or lists. A graph can encode relationships among structural elements of objects, or provide contextual information concerning individual data points (which may be described in vectorial form).

    The GSOM itself is an extension from a number of previous attempts in extending the classic self organizing map (SOM) idea originally due to Kohonen \cite{kohonen95a}. In previous versions of such extensions, we were able to progressively study graph objects which are mainly directed acyclic graphs using what we called Self-Organizing Map for Structured Domain (SOM-SD) \cite{somsd2003}, and the Contextual Self-Organizing Map for Structured Domain (CSOM-SD) \cite{ECML2005} where the graph objects could include cyclic graphs. However, the CSOM-SD had a nonlinear computational complexity; in most cases, this is close to quadratic. As a result, in this paper we introduce a different method, which is called Graph SOM (GSOM), in which we attempt a linear computational complexity method.

    In this paper we demonstrate the efficiency and capability of the GSOM. Comparisons are made with the existing machine learning method SOM-SD \cite{somsd2003}. SOM-SDs are capable of encoding tree-structured data and were shown to be good for tasks requiring clustering. This was demonstrated at an international competition on the clustering of XML formatted documents at which the SOM-SD produced winning performances in two consecutive years \cite{Springer2006,Springer2007a} in the clustering category.

Publication Date


  • 2007

Citation


  • Hagenbuchner, M., Sperduti, A., Tsoi, A. & Kc, M. (2007). Efficient Clustering of Structured Documents Using Graph Self-Organizing Maps. International Workshop of the Initiative for the Evaluation of XML Retrieval online:

Abstract


  • Graph Self-Organizing Maps (GSOMs) are a new concept in the processing of structured objects. These structured objects are described by graphs, e.g. acyclic directed graphs, cyclic graphs, un-directed graphs, etc. Graphs are generalizations of the more common vectors, or lists. A graph can encode relationships among structural elements of objects, or provide contextual information concerning individual data points (which may be described in vectorial form).

    The GSOM itself is an extension from a number of previous attempts in extending the classic self organizing map (SOM) idea originally due to Kohonen \cite{kohonen95a}. In previous versions of such extensions, we were able to progressively study graph objects which are mainly directed acyclic graphs using what we called Self-Organizing Map for Structured Domain (SOM-SD) \cite{somsd2003}, and the Contextual Self-Organizing Map for Structured Domain (CSOM-SD) \cite{ECML2005} where the graph objects could include cyclic graphs. However, the CSOM-SD had a nonlinear computational complexity; in most cases, this is close to quadratic. As a result, in this paper we introduce a different method, which is called Graph SOM (GSOM), in which we attempt a linear computational complexity method.

    In this paper we demonstrate the efficiency and capability of the GSOM. Comparisons are made with the existing machine learning method SOM-SD \cite{somsd2003}. SOM-SDs are capable of encoding tree-structured data and were shown to be good for tasks requiring clustering. This was demonstrated at an international competition on the clustering of XML formatted documents at which the SOM-SD produced winning performances in two consecutive years \cite{Springer2006,Springer2007a} in the clustering category.

Publication Date


  • 2007

Citation


  • Hagenbuchner, M., Sperduti, A., Tsoi, A. & Kc, M. (2007). Efficient Clustering of Structured Documents Using Graph Self-Organizing Maps. International Workshop of the Initiative for the Evaluation of XML Retrieval online: