A brief introduction to Social Network Analysis

Emre Yüksel
7 min readMar 13, 2022

--

Network Analysis is a method for finding patterns under the structure of the network. It relies on graph theory. The network consists of nodes that represent individuals, people, or things, and edges represent the connection or relationship between nodes. Social Network Analysis (SNA) is interested in relationships among social entities such as individuals or organizations.

Example network (edges have no direction)

Nodes and edges have different meanings in different networks. For example, for a bank, the nodes represent bank accounts, while the edges represent the transaction between these accounts. For social media applications such as Facebook and Twitter, nodes represent individuals, while edges represent there is a connection between these people.

Nodes and edges create a graph together and edges may have direction. If the way of the relationship between nodes is insignificant then edges have no direction and the graph becomes undirected. In the social media application example, if the edge exists between two nodes (or in other words two people), it means that they are both friends of each other. Therefore this relationship is undirected. In the bank example, the relationship is directed because transaction happens from one account to another. Therefore, we design our network with directed graphs.

Retrieved from https://gistbok.ucgis.org/bok-topics/social-networks

The size of the network is defined as the number of nodes or edges. The density of the network is calculated as the total number of existing edges divided by the number of all possible edges. It takes a value between 0 and 1. The degree of a node represents the number of edges connected to that node. If the edges have direction in the network, then each node has in-degree and out-degree values. In-degree represents the incoming edges to the node while out-degree represents the outgoing edges from the node. The probability distribution of these degrees of all nodes creates degree distribution and social networks can be characterized by using this statistic.

Fundamental Concepts of Social Networks

There are basic concepts that come to mind when it comes to social network analysis. Let’s look at the basic definition of these concepts:

  1. Actor: Social networks represent the relationship between social entities. Each social entity is called an actor. Each entity can be of the same type, as well as of different types.
  2. Relation Tie: Ties are the connection between actors.
  3. Dyad, Triad, and Subgroup: A pair of existing or possible two actors are called dyads, while a subset of three existing actors is called triads. Generalizing this logic, a subset of any subset of actors is called a subgroup.
  4. Group: The collection of all actors is called a group
  5. Relation: The collection of ties in the specific group is called relation

Finally, a social network is a social structure that existed by a collection of actors and their relations.

Connection Metrics in Social Networks

Correlations are observed between the nodes in various patterns. The following metrics help us decipher meaning from connections between nodes

An example of transitivity
  1. Transitivity demonstrates the amount of the corresponding relationship for two nodes connecting by an edge. In social networks, when we see actors as people, it indicates the likelihood of two person’s friends becoming friends.
  2. Reciprocity shows the tendency of the actor to which an actor is connected to form a connection to itself (in other words reciprocate each other’s).
  3. Assortativity indicates the tendency of an actor to connect to actors similar to itself in terms of the magnitude of the degree.
  4. Homophily shows the tendency to form ties between similar actors.

Centrality Measures

A variety of centrality measures are used to leverage the information of nodes (actors in the social networks) in the network. We try to answer the question of which nodes are more important and sometimes which one is the “most important”. The most popular centralities are as follows:

  1. The degree centrality of a node refers to the number of edges connected to that node.
  2. The closeness centrality is a measure of how close a node is to other nodes in the node-set.
  3. The betweenness centrality is determined by measuring the fraction of paths that connect all pairs of nodes and include the node of interest.
  4. The eigenvector centrality determines the centrality of the node by looking not only at the number of edges connected to the node of interest but also at its quality.

Networks have various properties and they can give insights not only of the two nodes, but of the relationship within one small part of the network, or about the entire network. Studies in networks can be separated into 3 different levels of abstraction:

  • Element-level analysis investigates the importance of individual nodes and edges
  • Group-level analysis determines the cohesive and dense groups in the network
  • Network-level analysis is interested in the topological properties of the network

According to the intention of the study, the method and metrics used in the research differ.

Gephi

Gephi is a free and open-source software tool used for network analysis and visualization written using Java and Netbeans. It was developed by the students of the University of Technology of Compiègne in 2008, and the latest update came in 2013. The application provides tools with which you can analyze networks, extract statistical information, upload manual data, as well as download data through the API.

Gephi’s interface

As you can see in the tab on the right there is a window called statistics. Here are some techniques that are often used for network analysis. Now we will look at the definitions of them.

Network Overview

Network Diameter: It shows the shortest path between the two most distant nodes in the networks. It provides inference about the path it needs to travel to get to all sides of the network. Jackson’s study shows us that the diameter and the average distance do not change as the homophily increases in random networks[4].

Graph Density: It is the ratio of the number of edges divided by the number of possible edges. The density of the complete graph is 1. The density of null graphs in which their edges are isolated is 0. In other words, there are no edges between any nodes.

HITS (Hyperlink-induced topic search): HITS discovers the relationship between websites by computing authority and hubs. Authority measures the quality of the node while hubs measure the quality of the node’s links. For search engine applications, it gives an opportunity to analyze relevant web pages for a particular search.

Modularity: Networks aggregate into subgroups according to the strength of the connections. These subgroups are called modules or communities. Modularity measures the strength of the connections. High modularity indicates dense connection inside the module and sparse connection outside the module.

PageRank: It shows the importance of nodes(pages) by counting the number and quality of links. The algorithm output is a probability distribution that represents the likelihood of reach at any page by clicking on links randomly. The probability parameter in the Page Rank settings in Gephi is used to represent probabilistically when the person (or surfer) who clicks on links randomly will stop. This parameter is also known as the damping factor, and various studies show that it is optimal to set the parameter around 0.85.

Connected Components: Connected components are a subgroup in which each pair of nodes is connected by a path. If the graph has more than one connected component, the union of them gives the set of all edges of the graph. There are 2 different properties in connected components. The first one is the set of the connected component is always non-empty. Another property is that connected component sets are pairwise disjoint which means the intersection of two separate sets of connected components gives an empty or null set.

Node Overview

Average Clustering Coefficient: It shows the tendency of two connected nodes to form larger connected groups (clustering).

Eigenvector Centrality: As explained in centrality measures, it measures the influence of a node based on node connections. The node connected to the high-scoring nodes has a high eigenvector score. PageRank algorithm is based on eigenvector centrality. It differs from the degree-centrality. The fact that a node has many incoming edges does not mean that its eigenvector centrality is high because all linkers may have a low centrality score. By the same logic, having a small number of outgoing edges does not prevent high eigenvector centrality because linkers may be important.

Edge Overview

Average Path Length: It refers to the average number of steps along the shortest paths for all possible nodes in the network. The average value gives an indication of the tendency to cluster at the network level.

Dynamic

The degree and coefficient were explained in the previous sections.

This blog post can be considered the starting point for the notes I have taken for myself in my work on social network analysis. Contents will be detailed by the details of technologies that can be used in the analysis phase and use-case examples in the future. Thank you for reading.

References

[2] Andris, C. (2019). Social Networks. The Geographic Information Science & Technology Body of Knowledge (2nd Quarter 2019 Edition), John P. Wilson (Ed.). DOI: 10.22224/gistbok/2019.2.9(link is external).

[3] Apicella, C. L., Marlowe, F. W., Fowler, J. H., & Christakis, N. A. (2012). Social networks and cooperation in hunter-gatherers. Nature, 481(7382), 497–501.

[4] Jackson, M. O. (2008, December). Average distance, diameter, and clustering in social networks with homophily. In International Workshop on Internet and Network Economics (pp. 4–11). Springer, Berlin, Heidelberg.

--

--

Emre Yüksel

Data Scientist @ Getir | Computer Engineering MSc Student @ Bogazici University