< Pagerank is eventual equilibrium - brylinski sci

Pagerank is eventual equilibrium

posted by Ranee Brylinski and Jean-Luc Brylinski

PageRank is the eventual equilibrium of the flow of probability on a finite, directed graph $G$. Here the flow evolves according to a specific type of random walk process on the nodes of the graph.

This random walk process on the nodes of the graph is the Google random surfer model, introduced for document ranking and retrieval  by Brin and Page in their seminal 1998 paper [S. Brin and L. Page, Comp. Networks ISDN Systems 30, 107, 1998]. The random surfer “surfs” the graph, moving in discrete time steps from node to node (e.g., from document to document, or from web page to web page). Let’s label the nodes of $G$ by the numbers $1,…,N$; the total number $N$ of nodes is finite.

The random surfer, call her Sue, moves between nodes according to some fixed set of probabilities, so that with probability $p(i,j)$ she moves from node $i$ to node $j$. The self-node probabilities $p(i,i)$ are all positive, and so Sue may simply stay in place in any given move. The possibility to stay put is the possibility to be lazy, and so Sue executes a lazy random walk on the nodes of the graph, as opposed to a conventional random walk. Anyway, the $p(i,j)$ are just the transition probabilities of a finite state Markov chain, with the states being the nodes of the graph. The transition matrix $$P=[p(i,j)]$$ is the Google matrix. Note $p(i,j)$ only comes into play when Sue is already at node $i$, and so the transition probability $p(i,j)$ is the conditional probability $p(j|i)$, the probability of $j$ conditional on $i$ (one time step is implicit here, but later we will encounter $n$-step conditional probabilities).  

The transition probabilities $p(i,j)$ in the Google random surfer model are constructed in the following way. To begin with, the graph $G$ is specified by its set $V$ of nodes and by its set $E$ of directed edges. Here $(i,j)$ means an edge starting from node $i$ and going to node $j$; we say $i$ points to $j$ and write $i\leadsto{j}$. The out-degree $d_{out}(i)$ of node $i$ is the number of edges of the graph which start at $i$. Ordinarily, there are no loops $(i,i)$ in the graph, but this may not be true, or we may end up creating loops in order to set up a Markov chain on the graph with decent dynamics.

The plain vanilla way to take a random walk on $G$ is to move from node $i$ along an edge to node $j$ by choosing uniformly at random from the available edges. More precisely, if there is at least one edge starting at $i$, so $d_{out}(i)$ is not zero, then the probability of going to node $j$ is either $$m(i,j) = 1/d_{out}(i)$$ if $(i,j)$ is an edge or $m(i,j)=0$ otherwise. If no edge starts at $i$ ($i$ is a sink), then we change $G$ by adding edges starting at $i$, say $r$ of them, so that now $d_{out}(i)=r\gt{0}$, and then we define $m(i,j)$ in the same way.  A common remedy is to put in edges $(i,j)$ from $i$ to every node $j$, including $i$, so that $r=N$.  (There are other ways to deal with sink nodes by changing $G$. We can introduce an extra node into $V$, or introduce a loop at each node into $E$, or just assume there are no sinks. Maybe using the teleportation vector is most natural; for standard teleportation this is the $r=N$ case.)  Then $M=[m(i,j)]$ is the transition matrix of this plain vanilla walk (which may well be lazy).

Now here is what the Google random surfer Sue does in each step: with probability $\beta$, she follows a plain vanilla random walk and with probability $(1-\beta)$, she teleports to any node $j$ chosen uniformly at random from all $N$ nodes. Here $0\lt\beta\lt 1$ is the damping factor, which is set to $.85$ in the Brin-Page paper. After teleporting, Sue may end up  at the same node (thereby being lazy). The transition probabilities for Sue are thus $$p(i,j) = {\beta}m(i,j) + (1-\beta)\frac{1}{N}$$ and so the Google matrix $P$ is $$P = {\beta}M + (1-\beta)\mathbf{e}\mathbf{u}^T$$ where $\mathbf{e}$ is the column vector of length $N$ made up of all $1$’s and $\mathbf{u}^T$ is the uniform probability row vector $$\mathbf{u}^T = \frac{1}{N}\mathbf{e}^T$$ Here the teleportation vector is $\mathbf{u}^T$, but in various applications it may be a different probability row vector of length $N$. The Google matrix satisfies the matrix inequalities $$P\succeq(1-\beta)\mathbf{e}\mathbf{u}^T\succ{0}$$ where the comparisons are true entry-wise.

As Sue surfs the graph $G$, she generates a flow of probability. The probability which is flowing is the probability of finding Sue somewhere on the graph, at some node. This probability flows as time goes by, in discrete steps, through $t=0,1,2, …$, with no end. Sue never sleeps, she just surfs.

Suppose Sue started initially a node $i=i_0$. Then the probability, at time $t=0$, of finding Sue at a node is $1$ at node $i$ and $0$ at all other nodes. This is a probability distribution  $\pi_0$ on the nodes, namely the Dirac measure concentrated at $i$. Now at time $t=1$, Sue has taken one step and the probability of finding her at a node $j=i_1$ is $p(i,j)$. So the Dirac measure $\pi_0$ has evolved, or flowed, into a new probability distribution $\pi_1$ on the nodes; note every node $j$ is reachable in one step due to teleportation. Sue will certainly move somewhere after one step, thus $\sum_j p(i,j) = 1$ and $\pi_1$ is indeed a probability distribution. Upon Sue’s second step, at time $t=2$, the probability of finding Sue at a node $k=i_2$ is $$p_2(i,k) = \sum_j p(i,j)p(j,k)$$ which we get by collecting all the two-step walks from $i$ to $k$ and noting that the walk $i\leadsto j\leadsto k$ has probability $p(i,j)p(j,k)$. Sue will certainly move somewhere after two steps, thus $\sum_k p_2(i,k) = 1$ and $\pi_0$ has flowed into a probability distribution $\pi_2$.  After $t$ steps, the probability of finding Sue at a node $q=i_t$ is $$p_t(i,q) = \sum_{\gamma}\mathbf{p}(\gamma)$$ summed over all $t$-step walks $\gamma$ from $i$ to $q$ where $\mathbf{p}(\gamma)$ is the probability that Sue follows $\gamma$ so that $\mathbf{p}(\gamma)$ is the product of the $t$ transition probabilities $p(i_s,i_{s+1})$ for the steps of $\gamma$. Again, $\sum_q p_t(i,q) = 1$ and $\pi_0$ has flowed into the probability distribution $\pi_t$ with values $$\pi_t(q) = p_t(i,q)$$ The $t$-step transition probability $p_t(i,q)$ is the $t$-step conditional probability $p_t(q|i)$, the probability of finding Sue at node $q$ at time $t$ conditional on finding her at node $i$ at time $0$.

This discussion works equally well if Sue’s initial location at time $0$ on the graph was uncertain so that her initial probability distribution $\pi_0$ was not a Dirac measure (meaning $\pi_0$ had some entropy). The flow of probability is linear and we easily see that, in $t$ steps, $\pi_0$ flows to the probability distribution $\pi_t$ given by $$\pi_t(q) = \sum_i \pi_0(i)p_t(i,q)$$ Clearly $p_t(i,q)$ is just the $(i,q)$ coefficient of the matrix $P^t$, the $t$th power of $P$. So the flow of probability equation expressed in matrix form is: $$\pi_t = \pi_0P^t$$ where the probability distributions are written as probability row vectors. This is consistent at $t=0$ because the $0$th power of $P$ is the identity matrix.

Now we can ask: what happens to the flow of probability as Sue surfs and time $t$ passes, so that $t\to\infty$ ?  The point is that as time passes, the flow of probability mixes the probabilities on the nodes, the components of the probability row vectors. The probability row vectors are probability distributions on the set of $N$ nodes. For instance, a probability distribution concentrated on any one node mixes to a probability distribution supported (being non-zero on) all the nodes. This is mixing in the thermodynamic sense, and so we can ask if equilibrium is reached ?  Of course, this is mixing of probability and we are asking about an equilibrium of probability. Markov chain theory applied to the Google matrix $P$ gives the answer; but we will derive it “from scratch” in the next post using matrix analysis (functional analysis) and some geometry/topology rather than purely probabilistic arguments.

The answer is that the flow of probability produces a limit probability distribution $$\pi_\infty = \lim_{t\to\infty}\pi_t$$ which is independent of the starting distribution $\pi_0$. Here the limit is pointwise so that the probabilities $\pi_t(i)$ all converge together, uniformly in $t$, to probabilities $\pi_\infty(i)$. This convergence also takes place in norm, for any $\ell_p$ norm on the set of nodes, $p\ge{1}$. This limit satisfies $$\pi_{\infty} = \pi_{\infty}P$$ meaning that $\pi_{\infty}$ is an equilibrium, or stationary probability distribution: it is exactly preserved under the flow of probability. Moreover $\pi_{\infty}$ is the unique equilibrium probability distribution. This limit $$\pi=\pi_\infty$$ is the PageRank vector, so that $\pi(i)$ is the PageRank of the node $i$.

The equilibrium equation $\pi=\pi_\infty$ says that at each node $j$, $$\pi(j) = \beta\sum_{i\leadsto{j}}\pi(i)\frac{1}{d_{out}(i)} + (1-\beta)\sum_i\pi(i)\frac{1}{N}$$ We have expanded out $\frac{1}{N}$ as $\sum_i\pi(i)\frac{1}{N}$ in order to show the amount of its PageRank that node $i$ sends out to node $j$. This amount is $\pi(i)p(i,j)$, and it is equal to the sum of the terms above which involve $\pi(i)$.  Notice that PageRank satisfies $$\pi\succeq(1-\beta)\mathbf{u}^T\succ{0}$$ where as above the comparisons are true entry-wise.

The PageRank vector $\pi$ is the probability distribution for finding the position of the Google random surfer Sue at the eventual equilibrium, that is, at the equilibrium that takes place eventually when $t$ passes to $\infty$. At equilibrium, much of the information about the Google random surfer Sue, and so about the graph $G$, has been lost: the initial probability vector $\pi_0$ is completely lost and much of the transition matrix $P$ has been lost. The information that survives is exactly the equilibrium probability vector $\pi$ (cf. the next post). 

The $i$th component $\pi(i)$ of the PageRank vector is the eventual, or limiting long-run, probability of finding Sue at node $i$. 

to be continued …

posted by Ranee Brylinski and Jean-Luc Brylinski