I just completed a survey article about uses of PageRank outside of web-ranking. The paper has been submitted to a journal, and I also posted the manuscript to arXiv.
David F. Gleich. PageRank beyond the Web. arXiv. cs.SI:1407.5107, 2014.
The goal with this paper was to enumerate the discuss how frequently PageRank is used for applications broadly throughout science and engineering. For instance, I discuss:
- PageRank in Biology – GeneRank, ProteinRank, IsoRank
- PageRank in Neuroscience
- PageRank in Chemistry
- PageRank in Engineered systems – MonitorRank
- PageRank in Mathematical systems
- PageRank in Sports
- PageRank in Literature – BookRank
- PageRank in Bibliometrics – TimedPageRank, CiteRank, AuthorRank, PopRank
- PageRank in Knowledge systems – FactRank, ObjectRank, FolkRank
- PageRank in Recommender systems – ItemRank
- PageRank in Social networks – BuddyRank, TwitterRank
- PageRank in the web & Wikipedia – TrustRank, BadRank, VisualRank
In order to make this a tractable survey, I had to strictly restrict the definition of PageRank I used (or the literature would have easily doubled!) The one I use is to define PageRank as the solution of the linear system
where is a column stochastic transition matrix between the nodes of a graph, is the teleportation parameter in PageRank, and is the teleportation distribution. This system yields the stationary distribution of a random process that,
- with probability , follows a transition according to the matrix
- with probability , jumps according to the distribution .
This system, with application specific constructions of and , suffices to reproduce the PageRank rank information in all of the above applications and makes it easy to identify similarities and differences. For instance, when is a sparse vector, then the resulting PageRank scores highlight a localized region in a large graph or system. (In the past, this has been called a personalized PageRank system, but I think localized PageRank is a better term for its wide, modern usage.) When is nearly uniform, then the resulting PageRank scores give a global measure of the importance of that feature in the graph or system.
But there’s another system that is even more common, which we call pseudo-PageRank (inspired by Boldi and Vigna’s “PseudoRank”). The only difference is that need not be column stochastic and that need not be a distribution, but there are still a few restrictions on them. This modification yields an incredibly flexible system that is mathematically equivalent to the PageRank construction above.
Anyway, I won’t spoil the entire paper — take a read and please post a comment or send an email if you spot anything that looks like it could be corrected!
Some random notes
- I spoke for a while with Sebastiano Vigna about whether to call my definition PseudoRank (as they had), but because there is a subtle, but important, difference, I decided on pseudo-PageRank.
- New papers kept appearing as I was putting this together. For instance, the recent Wikipedia paper by Eom, and the MonitorRank paper.
- It was interesting to read papers that didn’t easily let me tell if they were talking about directed or undirected applications! This makes a big difference for PageRank, although the math works out just fine either way, the solution with an undirected graph has some properties that make it worth knowing.
- I wish I had more time to talk about the amazing connections between localized PageRank and conductances — but that would lead down a large rabbit hole.