Handling Dangling Nodes — PageRank

Arpan Sardar
4 min readJun 25, 2020

--

This is a follow-up post to my previous article. I would recommend my readers to go through that article first. Here, I’ll be talking about dangling nodes and how they can be handled in the PageRank Algorithm.

What is a Dangling Node?

In the field of Network Analysis, dangling nodes are the nodes having no outward edges. In the figure given below, the node ‘B’ is a dangling node as it doesn’t have any outlinks. We all know that most web pages link to and are linked by other pages, but there are also many pages without any outlink

Figure: A Simple Directed Network

in the WWW network. With each iteration for updating the PageRank values, these notorious nodes keep on absorbing the PageRanks of other nodes more and more and eventually resulting in nodes ending up with a PageRank value of 0 or 1 at the time of convergence. The explanation is that a random surfer on the Internet, given enough time, will always end up reaching a dangling node and stop surfing through webpages through hyperlinks. Let’s investigate this phenomenon with a simple example. First, let’s create a directed network with a dangling node and then visualize it through a plot.

Code for Creating and Visualizing a Directed Network

The network is shown below. It consists of a single dangling node, ‘C’.

Figure: Directed Network having a Dangling Node ‘C’

We get the following result after giving the above network as the input to the code I discussed in my previous article.

Output of our own PageRank implementation

We see that the nodes ended up with a PageRank of exactly 0 or 1. This is the anomaly I mentioned earlier. Here, node ‘C’ is absorbing the PageRank values of the other nodes in the network as we did not handle the situation of sinks(pages with no outgoing links) in our code. Let’s further analyze how the nodes’ values are getting updated with each iteration until they converge.

PageRank Values of the Nodes for Each Iteration

The descending order of the nodes in terms of their PageRanks is C, A, B, D, E which is acquired from our code. Let’s see what the correct order should look like by applying networkx’s inbuilt PageRank function next.

Applying networkx’s inbuilt PageRank function
Output of the Inbuilt Function

So, our code fails to yield the correct result for a network having dangling nodes. Now, let’s handle the sinks with the concept of the damping factor. Damping factor d denotes the probability that the imaginary surfer will continue surfing through webpages by clicking on a random hyperlink from any node. The value of d is generally set to 0.85 and I will consider the same in my code. The highlighted portion below is the only modification that the code, discussed in the previous article, was lacking. Here, in each iteration, all the nodes get to keep 85% of their updated PageRanks and the sum of the rest 15% PageRank values of all the nodes is equally divided among them. In each iteration, the total PR value of the n number of nodes is always equal to 1. So, giving 15% of it to all the nodes means each node gets a shared value of (0.15 / n), which is added to the 85% of their PR values that each node retained. Thus, by deploying the concept of Damping Factor, we can prevent the nodes from converging with the value of either 0 or 1.

Modified Function (Refer to the GitHub link given below for the complete code)

Now, our modified code gives the following output. Notice that it’s the same as the output of the networkx’s inbuilt PageRank function. So, we conclude that the modified code correctly yields the output preventing the sinks.

The Output of the Modified Code

The following figure shows each iteration until convergence in detail.

PageRank Values of the Nodes for Each Iteration of the Modified Code

So, that was my approach to deal with the sinks for the presence of dangling nodes in a network. Please note that the final PageRank values of the nodes may differ slightly from the values of the networkx’s inbuilt PageRank function. This is because networkx may have a more robust approach in solving this problem or they may have considered a different convergence value etc.

  • Complete code can be found here. Modifications are welcome. :)
  • Connect with me on LinkedIn if you have any queries or suggestions.

I used Points Distribution Method for this implementation. In my next article, I’ve implemented the algorithm using Random Walk Method.

Thank you!

--

--