I returned from KDD2011 a few days ago. Here are some of the talks I found interesting, and some quick points about them.
- Learning Relationships Defined by Linear Combinations of Constrained Random Walks, William Cohen, Carnegie Melon University. Invited talk at Mining and Learning with Graphs workshop.
My interest The main piece of the talk was characterizing which are the most important types of paths in a large network of linked data. Think of taking a database and linking all the entities via there relationships. Which paths through relationships are the best? The authors address this question using a PageRank heuristic.
- Sparsification of influence networks. Michael Mathioudakis, Francesco Bonchi, Carlos Castillo, Aristides Gionis, Antti Ukkonen. My interest Most links in social networks do not correspond to the true influence paths. These authors study the problem of sparsifing an input graph with known information propagation paths such that it only has \( k \) links.
- Model Order Selection for Boolean Matrix Factorization. Pauli Miettinen, Jilles Vreeken
My interest Rather than factorizing a set of relational data — stated as a matrix — over the real numbers. These authors study the problem over the booleans. They use a minimum description length principle to decide the “rank” of the factorization.
- Large-Scale Matrix Factorization with Distributed Stochastic Gradient Descent. Rainer Gemulla*, Peter Haas, Erik Nijkamp, Yannis Sismanis
My interest These authors develop a block-cyclic stochastic gradient iteration to factor a matrix. It looks cute. I’d love to see it compared against some of the large scale distributed optimization routines developed at the national labs (e.g. Sandia/Argonne)
- Collective Graph Identification. Galileo Namata, Stanley Kok, Lise Getoor
My interest Given a graph, are the nodes and edges of the graph the real nodes and edges of the underlying problem you purport to study? Most often not, these authors argue. See their paper for a first step on how to “identify” the true graph from a set of nodes and edges. This involves: entity resolution, link prediction, and node labeling. I’m not sure if these approaches will scale, but it’s interesting work.