Network Science Series 001. Basic Concepts.

Alexei Stepanov
6 min readFeb 9, 2023

--

Hi! I have decided to share my notes on network science, which were compiled during my master’s program in data science at the university. These notes are based on the “Network Science Book” by Albert-László Barabási, as well as course materials and additional resources. This series serves as an introduction to the Python NetworkX library, a tool created by network scientists for network scientists. As such, I have included code snippets to illustrate specific concepts.

Table of Contents

I. Networks and Graphs
Links
Nodes
II. Understanding Node Connectivity in Networks
Degree
Degree Distribution
Average Degree
III. Adjacency Matrix

I. Networks and Graphs

A network and a graph are often used interchangeably in network science. However, there is a subtle difference between the two.

A graph is a mathematical representation of a set of objects in which links connect some pairs of objects. Graphs can be either directed, meaning the links have a direction, or undirected, meaning the links have no direction.

On the other hand, a network is a specific type of graph representing a real-world system, such as a social network, a transportation network, or an information network. Networks can be thought of as a graph with some added constraints and features, such as nodes and edges that have attributes, such as weight or time.

In summary, a graph is a general mathematical concept, while a network is a specific type of graph that represents a real-world system.

Unweighted, undirected graphs can be created with this code. Pay attention, that in this series, we will use Python NetworkX library.

import networkx as nx
G = nx.Graph()
G.add_edges_from([(1,2), (2,3), (3,4), (4,5)])

Links

Links in a network refer to the connections or edges between nodes in a graph. A node can be connected to one or multiple other nodes in a network. The links represent relationships, interactions, or dependencies between nodes.

In a network of N nodes, the number of links can change between L = 0 and Lmax, where

Maximum Links in Network (1.1)

We can access the number of links parameter by applying the number_of_edges() method on the graph object.

num_links = G.number_of_edges()

Nodes

Nodes in a network are the basic building blocks of a graph. They represent entities or objects in a network and can represent anything from people in a social network to webpages in the World Wide Web to neurons in a neural network.

Similarly to link number, there is a method for a number of nodes.

num_nodes = G.number_of_nodes()

II. Understanding Node Connectivity in Networks

You can gain important insights into centrality and network structure with a degree as a key connectivity metric.

Degree

The degree of a node in a graph is the number of edges that are connected to it. In an undirected graph, the degree of a node is equal to the number of neighbors it has, i.e., the number of nodes it is connected to. In a directed graph, the degree of a node can be separated into in-degree, which is the number of incoming edges to the node, and out-degree, which is the number of outgoing edges from the node.

The degree of a node is an important measure of its centrality in the network and can provide insight into its role and importance in the network. For instance, nodes with high degrees are often considered to be more central or influential in the network compared to nodes with low degrees.

We denote with ki the degree of the ith node in the network. In an undirected network the total number of links, L, can be expressed as the sum of the node degrees:

Total Number of Links (2.1)
# degrees of graph G
nx.degree(G)
# degrees of directed graph
G = nx.DiGraph()
G.add_edges_from([(1,2), (2,3), (3,4), (4,5)])
in_degree = nx.in_degree(G)
out_degree = nx.out_degree(G)a

Degree Distribution

The degree distribution of a network is the distribution of the number of edges connected to each node in the network. It provides a measure of the heterogeneity of node connectivity in the network and can be represented graphically as a histogram or a cumulative distribution function.

The degree distribution of a network can provide important information about the network’s structure and evolution. For example, some network models, such as the scale-free network model, predict that the degree distribution follows a power law, with a few high-degree nodes and many low-degree nodes. This can be used to understand the formation of the network and its resilience to failures or attacks.

The degree distribution, pk, provides the probability that a randomly selected node in the network has degree k.

# create degree distribution with a histogram
degree_values = list(degree.values())
plt.hist(degree_values, bins=range(min(degree_values), max(degree_values)+1))
plt.xlabel('Degree')
plt.ylabel('Count')
plt.title('Degree Distribution Histogram')
plt.show()

Average Degree

The average degree of a network is the average number of edges per node in the network. It provides a measure of the connectivity of the network.

In an undirected graph, the average degree is equal to the sum of the degrees of all nodes divided by the number of nodes:

Average Degree of Network (2.2)
degree = nx.degree(G)
average_degree = sum(degree.values()) / G.number_of_nodes()

III. Adjacency Matrix

The adjacency matrix can be thought of as a compact, binary representation of the network. This format enables us to leverage the full potential of powerful data analysis tools such as pandas and numpy to analyze and manipulate the network.

An adjacency matrix is a square matrix used to represent a graph, where the rows and columns correspond to nodes in the graph, and the elements of the matrix indicate the presence or absence of an edge between the nodes.

The adjacency matrix is a compact representation of a graph and can be used for efficient storage and manipulation of large graphs. However, it has a time complexity of O(n²), where n is the number of nodes, which can be slow for large graphs. Alternative representations, such as the adjacency list or the incidence matrix, may be more efficient for certain types of graph operations.

For undirected networks, a node’s degree is a sum over either the rows or the columns of the matrix

Node Degree in Undirected Network (3.1)

For directed networks, the sums over the adjacency matrix’s rows and columns provide the incoming and outgoing degrees, respectively

Node Degrees In Directed Network (3.2)
# Get the adjacency matrix representation of the graph
A = nx.to_numpy_matrix(G)
print(A)

Conclusion

To sum up, we’ve delved into the basics of networks and graphs, examining the nature of links and nodes and how we can quantify node connectivity through measures such as degree and average degree. In the next installment of this series, we’ll delve into more advanced topics such as directed vs undirected graphs, and specialized types such as weighted and bipartite graphs. Thanks!

Some links:

--

--

Alexei Stepanov

Hi! I am Data Scientist and this is my blog, sometimes I'm expressing genuine curiosity in data science stuff here