Data mining on Web documents is one of the most challenging tasks
in machine learning due to the large number of documents on the Web, the underlying
structures (as one document may refer to another document), and the
data is commonly not labeled (the class in which the document belongs is not
known a-priori). This paper considers latest developments in Self-Organizing
Maps (SOM), a machine learning approach, as one way to classifying documents
on the Web. The most recent development is called a Probability Mapping Graph
Self-Organizing Map (PMGraphSOM), and is an extension of an earlier GraphSOM
approach; this encodes undirected and cyclic graphs in a scalable fashion.
This paper illustrates empirically the advantages of the PMGraphSOM versus the
original GraphSOM model in a data mining application involving graph structured
information. It will be shown that the performances achieved can exceed
the current state-of-the art techniques on a given benchmark problem.