Visualizing the co18 Network

Sarath
4 min readFeb 21, 2018

--

This post presents the process and the results of a fun experiment I did last week. There are two questions this post tries to answer:

  • How to create a network graph of the people at ISB?
  • How to identify the most influential people on campus?

To answer these questions, we need data, and I got my data from a testimonial site that the college has tied up with. As I watched a flood of testimonials written by people to each other, I instantly knew that there was scope to create a network graph.

I started by scraping all the data from the testimonial site. I did this by using Chrome Web Scraper. Scraping this particular website was pretty easy since the URLs followed a fixed pattern. So within an hour, I obtained all the testimonials written by everyone to everyone.

Once I had the from and to details on an Excel sheet, I was ready to do some analysis. Before that, some numbers:

  • There were a total of 13,323 testimonials written on the website
  • A total of 598 people have received testimonials from at least one person
  • The average testimonial was 414 characters long
  • The most number of testimonials written by a single person was 195 (!)
  • The most number of testimonials received by a single person was 104

Following this, I used Gephi, which is an open source network analysis and visualization tool. I created edge and node tables and imported them to Gephi to get this directed network graph.

The in-degree graph shows a distribution of the number of testimonials that people have received. This is measured by a node’s (a person’s) in-degree centrality.

The average in-degree centrality was 22, which is to say that a person on average has received 22 testimonials.

The out-degree graph on similar lines is also shown here. (People who have not written testimonials have 0 out-degree.)

Next, I calculated the modularity to identify the different clusters within ISB (Gephi does this for you).

The result is not surprising at all! Each cluster has a size of ~75, which is the size of a section at ISB. There were 8 clusters in all, reflective of the 8 sections we have at ISB.

Then, the Eigenvector centrality — a measure of a person’s influence in a network — was calculated. You can read more about what different centrality measures mean here (or you could’ve taken NWAT in Term 6 :P). Once I had the modularity classes, the nodes were colored based on the cluster they belonged to. The size of each node is directly proportional of the amount of influence they have on the network. I then had the following graph.

Each color is a section at ISB

We can observe that this layout isn’t the best to visualize a network graph, especially after you put people’s names within the nodes. So after trying multiple layouts, I went ahead with the radial graph layout. Here it is:

Each section, with nodes (people) in the decreasing order of Eigenvector centralities.

Here’s a zoomed in preview of the final graph.

You can see the full network graph with the initials of people here (takes a while to fully load, works only on desktop).

--

--