Understanding Page Rank

Sarthak Anand
4 min readOct 16, 2018

--

PageRank is an intuitive way of ranking web pages, which formed the basis for Google’s web indexing algorithm during its early phase. In this article, you’ll learn about the intuition behind page rank and implementing page rank in python. The article is divided into the following sections:

  • Basic Idea behind Page Rank
  • Understanding the Pank Rank algorithm
  • Implementing Page Rank from scratch

Basic Idea behind Page Rank

The intuition behind the Page-Rank is based on the idea that the popularity of a webpage is determined not only by the number of incoming links but also by the kind of incomings links. Citations from highly ranked pages contribute more than lower ranked web pages. The page rank of a web page A cited by web page B shown in Fig. 1 is given the equation:

Fig. 1
  • PR(A): PageRank of Page A
  • PR(B): PageRank of Page B
  • P(B, A): Probability going from B to A (here it is equal to one)
  • d: is known as damping factor, to add some randomness to the equation.

Page rank over the complete web graph is calculated until it converges to final values.

Page Rank Algorithm

To have a better understanding of how page rank works, consider a graph (shown by Fig. 2) of web pages having links shown by the arrow. Note that, if there are web pages with no out link then they do not contribute to the page ranking (they are usually referred to as dangling pages).

Fig. 2

Initially, page rank of all the web pages is taken as 1. The weight of the edge is the probability of going from a web page X to Y ( the web page A has 2 out links, therefore, the probability to visit each web page is 1/2 ). After expressing the web graph in terms of probabilities the web graph looks something like the Fig.3. The page rank of each web page is determined by applying the PageRank equation. This process is repeated until the algorithm converges i.e. the values of page rank do not change beyond a small value ( know as epsilon usually fixed as 1e-4 ). The damping factor (d) introduced is to add some randomness over the web graph i.e. d is a probability that a user will move to the linked web page and 1-d is the probability of choosing a random web page, it is usually taken as 0.85.

Fig. 3

For iteration 1:

PR(C)=(1-d) + d*( P(A,C)*PR(A) + P(B,C)*PR(B) + P(D,C)*PR(D))

PR(C)=(0.15)+ 0.85*( 0.5*1 + 1*1 + 1*1) = 2.275

The PR(C) can also be calculated by matrix dot product.

Similarly, extending the equation for all the web pages we end up with the equation:

Where matrix C represents the probability transition ( C[i][j] = probability of the user transitioning from page i to page j). The C matrix of our example can be expressed as the matrix represented above. Also, the initial page ranks are as assigned 1 for all the web pages. The PRs of web pages are calculated until the PRs converge to a certain value.

Implementing Page Rank

Page Rank implementation in python:

Final ranking of our example are C > A > B > D !

Notice: page rank of A is high even when it has only one incoming link.

References

--

--