PageRank beyond the Web

I just completed a survey article about uses of PageRank outside of web-ranking. The paper has been submitted to a journal, and I also posted the manuscript to arXiv.

David F. Gleich. PageRank beyond the Web. arXiv. cs.SI:1407.5107, 2014.

The goal with this paper was to enumerate the discuss how frequently PageRank is used for applications broadly throughout science and engineering. For instance, I discuss:

PageRank in Biology – GeneRank, ProteinRank, IsoRank
PageRank in Neuroscience
PageRank in Chemistry
PageRank in Engineered systems – MonitorRank
PageRank in Mathematical systems
PageRank in Sports
PageRank in Literature – BookRank
PageRank in Bibliometrics – TimedPageRank, CiteRank, AuthorRank, PopRank
PageRank in Knowledge systems – FactRank, ObjectRank, FolkRank
PageRank in Recommender systems – ItemRank
PageRank in Social networks – BuddyRank, TwitterRank
PageRank in the web & Wikipedia – TrustRank, BadRank, VisualRank

In order to make this a tractable survey, I had to strictly restrict the definition of PageRank I used (or the literature would have easily doubled!) The one I use is to define PageRank as the solution of the linear system

$(I - \alpha P) x = (1-\alpha) v$

where $P$ is a column stochastic transition matrix between the nodes of a graph, $\alpha$ is the teleportation parameter in PageRank, and $v$ is the teleportation distribution. This system yields the stationary distribution of a random process that,

with probability $\alpha$ , follows a transition according to the matrix $P$
with probability $1-\alpha$ , jumps according to the distribution $v$ .

This system, with application specific constructions of $P$ and $v$ , suffices to reproduce the PageRank rank information in all of the above applications and makes it easy to identify similarities and differences. For instance, when $v$ is a sparse vector, then the resulting PageRank scores highlight a localized region in a large graph or system. (In the past, this has been called a personalized PageRank system, but I think localized PageRank is a better term for its wide, modern usage.) When $v$ is nearly uniform, then the resulting PageRank scores give a global measure of the importance of that feature in the graph or system.

But there’s another system that is even more common, which we call pseudo-PageRank (inspired by Boldi and Vigna’s “PseudoRank”). The only difference is that $P$ need not be column stochastic and that $v$ need not be a distribution, but there are still a few restrictions on them. This modification yields an incredibly flexible system that is mathematically equivalent to the PageRank construction above.

Anyway, I won’t spoil the entire paper — take a read and please post a comment or send an email if you spot anything that looks like it could be corrected!

Some random notes

I spoke for a while with Sebastiano Vigna about whether to call my definition PseudoRank (as they had), but because there is a subtle, but important, difference, I decided on pseudo-PageRank.
New papers kept appearing as I was putting this together. For instance, the recent Wikipedia paper by Eom, and the MonitorRank paper.
It was interesting to read papers that didn’t easily let me tell if they were talking about directed or undirected applications! This makes a big difference for PageRank, although the math works out just fine either way, the solution with an undirected graph has some properties that make it worth knowing.
I wish I had more time to talk about the amazing connections between localized PageRank and conductances — but that would lead down a large rabbit hole.

PageRank beyond the Web

Leave a comment Cancel reply

Recent Posts

Twitter

Links

Categories

github

RSS Feeds

Top Clicks

Meta

PageRank beyond the Web

Share this:

Related

Leave a comment Cancel reply

Recent Posts

Twitter

Links

Categories

github

RSS Feeds