PageRank beyond the Web

I just completed a survey article about uses of PageRank outside of web-ranking. The paper has been submitted to a journal, and I also posted the manuscript to arXiv.

David F. Gleich. PageRank beyond the Web. arXiv. cs.SI:1407.5107, 2014.

The goal with this paper was to enumerate the discuss how frequently PageRank is used for applications broadly throughout science and engineering. For instance, I discuss:

  • PageRank in Biology – GeneRank, ProteinRank, IsoRank
  • PageRank in Neuroscience
  • PageRank in Chemistry
  • PageRank in Engineered systems – MonitorRank
  • PageRank in Mathematical systems
  • PageRank in Sports
  • PageRank in Literature – BookRank
  • PageRank in Bibliometrics – TimedPageRank, CiteRank, AuthorRank, PopRank
  • PageRank in Knowledge systems – FactRank, ObjectRank, FolkRank
  • PageRank in Recommender systems – ItemRank
  • PageRank in Social networks – BuddyRank, TwitterRank
  • PageRank in the web & Wikipedia – TrustRank, BadRank, VisualRank

In order to make this a tractable survey, I had to strictly restrict the definition of PageRank I used (or the literature would have easily doubled!) The one I use is to define PageRank as the solution of the linear system

(I - \alpha P) x = (1-\alpha) v

where P is a column stochastic transition matrix between the nodes of a graph, \alpha is the teleportation parameter in PageRank, and v is the teleportation distribution. This system yields the stationary distribution of a random process that, 

  • with probability \alpha, follows a transition according to the matrix P
  • with probability 1-\alpha, jumps according to the distribution v.

This system, with application specific constructions of P and v, suffices to reproduce the PageRank rank information in all of the above applications and makes it easy to identify similarities and differences. For instance, when v is a sparse vector, then the resulting PageRank scores highlight a localized region in a large graph or system. (In the past, this has been called a personalized PageRank system, but I think localized PageRank is a better term for its wide, modern usage.) When v is nearly uniform, then the resulting PageRank scores give a global measure of the importance of that feature in the graph or system. 

But there’s another system that is even more common, which we call pseudo-PageRank (inspired by Boldi and Vigna’s “PseudoRank”). The only difference is that P need not be column stochastic and that v need not be a distribution, but there are still a few restrictions on them. This modification yields an incredibly flexible system that is mathematically equivalent to the PageRank construction above. 

Anyway, I won’t spoil the entire paper — take a read and please post a comment or send an email if you spot anything that looks like it could be corrected!

Some random notes

  1. I spoke for a while with Sebastiano Vigna about whether to call my definition PseudoRank (as they had), but because there is a subtle, but important, difference, I decided on pseudo-PageRank.
  2. New papers kept appearing as I was putting this together. For instance, the recent Wikipedia paper by Eom, and the MonitorRank paper.
  3. It was interesting to read papers that didn’t easily let me tell if they were talking about directed or undirected applications! This makes a big difference for PageRank, although the math works out just fine either way, the solution with an undirected graph has some properties that make it worth knowing.
  4. I wish I had more time to talk about the amazing connections between localized PageRank and conductances — but that would lead down a large rabbit hole.
This entry was posted in Uncategorized. Bookmark the permalink.

Leave a comment