Cumulative Rank Aggregation of a Family of Network Centrality Indices
A growing number of centrality indices are used today in social network analysis. The purpose of using all these network centrality measures is that through them one might be able to identify the most important nodes according to a variety of structural criteria (like nodal degree, closeness, betweenness, eigenvector, PageRank etc.). Moreover, computations (in Python, R etc. or standalone applications) may very easily derive the tables of various centrality indices of network nodes. Therefore, knowing a good deal of network nodal centralities, the crucial question would be how to make sense for all such indices in a illuminating way that would account for the structural features that an empirical network exhibits. What I am proposing here is a methodology for a cumulative ranking of network nodes according to the scores that each node possesses, not on a single centrality measure, but on a whole group (a family) of centrality measures. In particular, the family of centrality indices that I am considering here includes:
- for undirected graphs, the following 10 nodal measures: degree, closeness, betweenness, eigenvector, HITS, Katz, PageRank, load, communicability, and current flow, while,
- for directed graphs, the following 10 nodal measures: out degree, in degree, closeness, betweenness, eigenvector, HITS hubs, HITS auths, Katz, PageRank, and load.
The first step of the procedure that I am following requires the conversion of the sequence of numerical values (typically, scalars from 0 to 1), which a centrality measure takes on network nodes, to a corresponding ordinal sequence of integers, which represent the existing ranking (ordering) among the numerical values of nodal centrality indices. For instance, if a certain centrality index takes the following values on the 15 nodes of a network:
[0.3684210526315789, 0.4827586206896552, 0.4375, 0.4, 0.3888888888888889, 0.3333333333333333, 0.4666666666666667, 0.32558139534883723, 0.56, 0.2857142857142857, 0.3684210526315789, 0.5, 0.3888888888888889, 0.4375, 0.4827586206896552]
then the corresponding (sequence of) ranks of the 15 nodes is the following:
[5, 6, 5, 3, 4, 2, 10, 8, 3, 8, 7, 7, 11, 9, 1]
Notice that a sequence of ranks is always given in decreasing order, starting with the rank of the largest element of the sequence of numerical values, the rank of which is set equal to 1. Apparently, the maximum rank can be less or equal to the total number of nodes, as it happens in this examples where there exist 4 “ties,” the maximum rank being 11.
The point is that, when concerned with nodal rankings, it would be motivational to think of a network as an electoral system that uses preferential ballots. In this sense, each node can be considered as a candidate in a hypothetical election with the centrality indices being the voters, “who” instead of voting for a single candidate/node, they are placing their votes as preferences in a hierarchy on the ordinal scale. For instance, in the previous example, the shown sequence of ranks can be considered as the vote of a single voter/centrality-measure, who casts the last node/candidate on top of everybody else in her preferences (1st rank), putting the first node somewhere in the middle (5th rank) and so on, attributing rank-choices for all nodes/candidates of the network/electoral-system.
Then the question is, after all voters/centrality-indices cast their vote in the form of a ranked choice over candidates/nodes, who is the winner among the latter? Actually the final outcome of a ranked voting procedure has to be a rank itself (like the previous example’s sequence of ranks), in which all candidates/nodes are placed in a resulting hierarchical order according to the preferences of voters/centrality-indices. Nevertheless, one should add the caveat that such electoral systems are not always conclusive, although it is not the case here to be concerned with such paradoxes. In any case, what I have described metaphorically is exactly the general idea of what might be called the outcome of a rank aggregation of a family of network centrality indices (seen as voters), essentially, ordering in an ordinal scale hierarchy all the network nodes (seen as candidates) in order to be able to pronounce the relative “collective” or “cumulative” importance of all the network nodes in terms of a multi-sided totality (a family) of network centrality indices (and not the one-sidedness of separate structural criteria).
Technically speaking, the computational methodology to implement such a rank aggregation is informed by the advances in the Theories of Social Choice and Voting. However, given the computational complexity of the problem, especially for large networks (as the problem is NP-hard even for only four voters/centrality-indices, because of which one is necessitated to resort to approximate solutions, usually provided by a constrained integer programming formulation of the problem), the only thing I can do here is to talk about the heuristics of my approach of how to come to grips with the problem of rank aggregation of scores coming from a whole range of network centrality measures.
Actually, my idea resembles the pairwise comparisons of the Kemeny-Young method of rank aggregation, although what I am doing is something simpler and it proceeds in a traversal direction. Instead of considering the ranks in a centrality index that nodes possess, I am focusing on the ranks of single nodes for all the centrality indices (of a given family). To explain what I mean, it would be more instructional to start with examples. Say that a certain node (let us say node i) possesses the following “profile” of scores on 10 centrality indices:
[3, 2, 5, 3, 3, 4, 6, 5, 4, 4]
This means that node i is ranked 3rd with regards to the first centrality index, 2nd the second, 5th the third and so on. Let us pick up the minimum of all the ranks of i in this profile, which is 2 here, and then ask the question: If we take a second node j, are all the ranks in j’s profile bigger or equal than 2, the minimum of the ranks in i’s profile? If they are, we are going to say that “node i beats node j” and the rationale of this expression comes from the fact, for a node, the higher a particular rank in a certain centrality happens to be, the lower the importance of this node with regards to that centrality turns out to be. For example, if the node j had the centrality profile:
[5, 9, 12, 14, 14, 14, 13, 12, 14, 14]
then it would be the case of i beating j. However, for a node k with profile:
[1, 4, 2, 5, 5, 1, 2, 1, 2, 2]
it would not be true that i would be beating k, but exactly the opposite that k would be beating i. Notice that for a node p with profile:
[1, 2, 3, 8, 9, 10, 11, 12, 13, 14]
neither i beats p nor p beats i.
However, given the fact that the so defined relationship of “beating” is reflexive (i.e., i beats herself, for every node i!), it makes sense to count how many are the nodes j that i beats. I would like to call this number of nodes that a given node i beats as the “rung” of i, because the more nodes i beats, the higher to the rank hierarchy i ascends. For instance, if the centrality profile of node i was such that i was ranked first in every centrality (in which case the minimum rank of i would be1), then trivially i would beat all network nodes and the rung of i would be equal to the total number of nodes in the network. On the other side, if there was a single node q ranked last in every centrality, while all the other nodes were beating k, then the rung of k would be obviously taken equal to 1. Typically, several (if not most) nodes would be ranked in between the previous two extremes.
Therefore, the final rank of a node could result by subtracting her rung from the total number of nodes of the network. Obviously, knowing the rung of each node amounts to knowing her “cumulative” or aggregate ranking in accordance to the whole set of hierarchical placements that the entire family of centrality measures imposes on every network node.
In another post (https://medium.com/@mosabou/the-twitter-network-of-the-february-march-2018-ucu-mobilization-7c81e57a2874), I have applied this methodology of aggregate network ranking to a number of relatively large Twitter networks (of hashtags and mentions). In the sequel, here, I am going to give three simple examples of small networks.
The first example is the network of the renowned 15 Florentine Families with with 20 ties (marriages etc.):
The 15 nodes (families) of the Florentine network take the following values with regards to the following 10 centrality indices:
Furthermore, the corresponding individual nodal ranks on each centrality are:
Applying the rank aggregation methodology that I have described above gives the following “cumulative” network ranks of the 15 Florentine families:
As one may observe, the Medici have the highest total rank (in fact, the Medici beat every other family/node in the Florentine network). Second in the overall ranking come 3 families (Strozzi, Guadagni and Ridolfi), who stand for as a group of highly frequented hubs. Third, fourth and fifth in rank are 7 families lying at the periphery of the Medici and the main hubs. At the very end, sixth in rank come 4 families who are what in graph theory are called “leaves,” i.e., nodes with degree one, which are hanging upon other (better embedded) nodes. Apparently, the structural classification given by this methodology of network centrality aggregation makes also perfect sense in terms of the historical importance and political salience (power) of the 15 Florentine Families.
The second example is David Krackhardt’s “strong trust network at the Pentagon” (that I’ll be simply calling “Krackhardt’s network” from now on), which is composed of 17 nodes/persons and 41 edges/ties.
The table of centralities is:
and the individual centrality rankings:
resulting the following cumulative nodal ranks:
The last example is an “artificial” directed Gnm random network composed of 50 nodes and 150 edges:
Nodal centralities (only of a selection of 15 nodes):
Nodal centrality ranks (only of a selection of 15 nodes):
Nodal ranks resulting cumulatively from all 10 centralities (shown are only nodes up to the 4th rank):