Comparative Analysis of Degree Centrality and Betweenness Centrality in Large Graphs

Vishal Sharma
Nerd For Tech
Published in
6 min readOct 7, 2023

Centrality measures play a crucial role in network analysis, helping us identify the most important nodes within a network. Two commonly used centrality measures are degree centrality and betweenness centrality. In this short article, we compare and contrast these two measures, considering their applicability and performance in a real-world network of APS Dataset. We will use various mathematical calculations and visualization techniques to determine which centrality measure is more suitable for our dataset.

Photo by kimi lee on Unsplash

Introduction

Network analysis is a fundamental tool in understanding the structure and dynamics of complex systems, ranging from social networks to transportation systems. Centrality measures are used to identify key nodes within these networks, which can provide valuable insights into their functioning. Two widely used centrality measures are degree centrality and betweenness centrality.

Degree Centrality measures the number of direct connections a node has, making it a straightforward metric for identifying highly connected nodes. On the other hand, betweenness centrality quantifies the extent to which a node lies on the shortest paths between other nodes, indicating facilitating communication within the network.

The Graph Network

import networkx as nx
import matplotlib.pyplot as plt

G = nx.read_gexf('sampled_APS_pacs052030.gexf')

# Adjust the parameters of the spring layout
pos = nx.spring_layout(G, iterations=15, seed=1721)

# Customize visualization settings
plt.figure(figsize=(15, 9))
nx.draw(G, pos=pos, with_labels=False, node_color='blue', node_size=15, edge_color='black', width=0.15)

plt.axis('off')
plt.tight_layout()

plt.show()
print("Number of nodes:", G.number_of_nodes())
print("Number of edges:", G.number_of_edges())
Number of nodes: 1853
Number of edges: 3627

Centrality Distribution Plots:

  • Degree Centrality Distribution: In the Degree Centrality distribution plot, we observe that the centrality values are distributed somewhat evenly, with a slight skew towards higher values. This indicates that many nodes have a similar number of direct connections, but there are a few highly connected nodes.
  • Betweenness Centrality Distribution: The Betweenness Centrality distribution, on the other hand, shows a different pattern. It is likely right-skewed, suggesting that only a few nodes act as critical bridges or intermediaries between other nodes in the network.

Conclusion: Based on these distributions, we can conclude that Degree Centrality tends to highlight nodes with higher degrees (more direct connections), while Betweenness Centrality identifies nodes that lie on critical paths between others.

Scatter Plot: Degree Centrality vs. Betweenness Centrality:

  • In the scatter plot, we can observe the relationship between Degree Centrality and Betweenness Centrality for each node. The points are somewhat scattered, and there doesn’t seem to be a strong linear correlation between the two measures.
  • However, there is a trend where nodes with higher Degree Centrality often have higher Betweenness Centrality, indicating that nodes with more connections are more likely to lie on shorter paths between other nodes.

Conclusion: While there is a positive correlation between Degree Centrality and Betweenness Centrality, it’s not a perfect correlation. This suggests that while highly connected nodes often play important roles as intermediaries, there are exceptions where nodes with fewer connections have high Betweenness Centrality due to their strategic position in the network.

The choice between Degree Centrality and Betweenness Centrality depends on the specific context and research questions:

Use Degree Centrality when:

  • Identifying nodes with many direct connections is important.
  • You want a quick and computationally efficient measure.
  • You are interested in understanding the local influence of nodes.

Use Betweenness Centrality when:

  • You want to identify nodes that act as critical bridges or bottlenecks in the network.
  • Understanding how information or resources flow through the network is crucial.
  • You’re interested in the global influence of nodes in facilitating communication.

In practice, it’s often useful to consider both centrality measures together to gain a more comprehensive understanding of a network’s structure and the roles different nodes play. The choice of centrality measure should align with the specific objectives and characteristics of the network being studied.

We create a box plot that compares the centrality distributions of Degree Centrality and Betweenness Centrality. This plot helps visualize the spread, median, and potential outliers of the centrality measures.

We create Cumulative Distribution Function (CDF) plots for both centrality measures. CDFs provide insights into the cumulative distribution of centrality values, allowing us to compare their overall distributions.

By examining these plots, we can draw additional conclusions:

Box Plot:

  • The box plot visually demonstrates the differences in the spread and central tendency of Degree Centrality and Betweenness Centrality.
  • Betweenness Centrality has a wider spread, indicating that it can vary significantly across nodes.
  • DegreeCentrality has a narrower spread, suggesting that its values are concentrated around a central range.

CDF Plot:

  • The CDF plots provide a clearer view of the cumulative distribution of centrality values.
  • We can observe how quickly the cumulative distribution increases for each centrality measure.
  • Differences in the steepness of the CDF curves reflect variations in centrality values.

These additional plots enhance our understanding of the differences between Degree Centrality and Betweenness Centrality. Degree Centrality tends to exhibit a more dispersed distribution, while Betweenness Centrality values are more concentrated around a central range. The choice between the two measures should consider the specific characteristics and objectives of the network analysis.

Probability Density Function (PDF) Plot:

  • We create PDF plots for Degree Centrality and Betweenness Centrality using kernel density estimation (KDE).
  • These plots show the probability density of different centrality values, allowing us to see the distribution shapes more clearly.

The PDF plot helps us understand the distribution of centrality values, emphasizing the shape of the distribution curves, potential peaks, and skewness.

Violin Plot:

  • We create a violin plot to compare the distributions of Degree Centrality and Betweenness Centrality side by side.
  • The violin plot illustrates the probability density of centrality values and shows how the distributions differ between the two measures.

The violin plot visually compares the distributions of centrality values, highlighting any variations in shape, spread, and skewness.

Conclusion:

  • Degree Centrality and Betweenness Centrality are complementary measures that provide different perspectives on node importance within a network.
  • Degree Centrality is effective for identifying well-connected nodes, while Betweenness Centrality is valuable for identifying nodes that control information flow and act as critical intermediaries.
  • The choice between these centrality measures should align with the specific research questions and objectives of the network analysis.
  • In practice, considering both measures together can provide a more comprehensive understanding of network structure and node importance.

The analysis conducted in this study helps researchers and analysts make informed decisions about which centrality measure to use based on the characteristics and goals of their network analysis.

--

--

Vishal Sharma
Nerd For Tech

Computer Science Research Scholar at IIT Guwahati, exploring machine learning and AI in mathematics, cosmology and history.