DMSS2006: The International Workshop on Data-Mining and Statistical
Science,
September 25-26, 2006, Century Royal Hotel, Sapporo, Japan
Graph Mining Applications to Machine Learning Problems
Koji Tsuda (Max Planck Institute for Biological Cybernetics, Germany)
Graph data is getting increasingly popular in, e.g., bioinformatics and text processing. A main difficulty of graph data processing lies in the intrinsic high dimensionality of graphs, namely, when a graph is represented as a binary feature vector of indicators of all possible subgraphs, the dimensionality gets too large for usual statistical methods. In this talk, I report two of our recent works about applying graph mining to solve machine learning problems in this very high dimensional space. The first one is about unsupervised clustering of graphs, where informative features are greedily chosen for learning a binomial mixture model. In the second topic, I will talk about a graph boosting algorithm for supervised learning, where a large scale linear programming problem is solved efficiently by the combination of column generation and weighted substructure mining.