PageRank and Centrality: They measure importance.

Huda Nassar
3 min readJan 6, 2024

--

Alright, so, yet another, “what is PageRank” answer:

PageRank is the algorithm that kickstarted Google. Its key goal was to rank pages in terms of measured importance. It relied heavily on the web’s connections… For example, and just for the sake of illustration, imagine that there are only 100 webpages on the Internet, and 99 of them have a redirecting link to the 100th page. This type of pattern is indicative that the 100th page is very important. Here is an example to illustrate this scenario.

And, for the curious, here are the actual PageRank values of the above graph. Node 17 is the center node in the figure; it is the one with the highest PageRank value (The edge list of the graph above is provided at the bottom of this post).

 (1, 0.012031862745098041)
(2, 0.012232557189542486)
(3, 0.017431393995098042)
(4, 0.11277653873406326)
(5, 0.04077737697022948)
(6, 0.17389675820604356)
(7, 0.06862827472407124)
(8, 0.008823529411764707)
(9, 0.008823529411764707)
(10, 0.1172937216767716)
(11, 0.013823529411764708)
(12, 0.008823529411764707)
(13, 0.019344362745098043)
(14, 0.011323529411764708)
(15, 0.008823529411764707)
(16, 0.011323529411764708)
(17, 0.35382244713163113)

Centrality

Photo by Omar Flores on Unsplash

If you’ve heard of PageRank before, you may have heard of “centrality”. As mentioned earlier, PageRank is a way to rank nodes in a graph and in some way, measure their importance.

First, on the topic of importance:

Quantifying this importance can be crucial in various cases. For example, consider a social network where two ad agencies/influencers offer to advertise your product and ask for the same price. You would likely want to go with the agency that is “more important”. In a similar way, we can measure the importance of airports in a flight network and decide to have a layover in a “more important” airport (if we have concerns about delays).

Second, on centrality:

Centrality: In plain terms, centrality measures how central or important a node is in a graph.

So far, we’ve indicated that the PageRank value of a given node correlates with the “importance” of that node. Other methods exist to measure centrality.

A trivial measure of centrality is degree centrality. In this measure the importance of a node is correlated with how many edges are connected to it. This type of centrality only looks at the most immediate neighbors of a node while a measure like PageRank looks at the graph somewhat more collectively. Depending on the graph and problem at hand, degree centrality could be problematic. To see why the degree centrality measure can be problematic, consider a transportation network where we have one airport connecting two hubs. At a first glance, this node seems to be not important according to its degree centrality; nevertheless, because it connects two hubs, this node will likely have a high PageRank value, indicating an importance that is higher than anticipated.

If you want to know all the details and history, please see “A History of Spectral Ranking”. Some ideas related to those we discuss date back to ways to rank chess players from the late 1800s!

Also, if you are curious: I talk about five types of centrality in this blogpost with a really fun dataset.

Lastly, I promised the edges of the graph above, here it is:

 (1, 2)
(1, 3)
(2, 3)
(3, 4)
(3, 6)
(3, 7)
(4, 5)
(4, 6)
(7, 6)
(8, 7)
(9, 7)
(9, 11)
(10, 7)
(12, 11)
(12, 13)
(13, 10)
(14, 13)
(15, 14)
(15, 16)
(16, 1)
(16, 13)
(1, 17)
(2, 17)
(3, 17)
(4, 17)
(5, 17)
(6, 17)
(7, 17)
(8, 17)
(9, 17)
(10, 17)
(11, 17)
(12, 17)
(13, 17)
(14, 17)
(15, 17)
(16, 17)
(17, 4)
(17, 6)
(17, 10)

--

--

Huda Nassar

I like graphs, and often talk about them. I also often talk about Julia, data visualization, and what we’re building at RelationalAI. 🐦/ twitter.com/nassarhuda