If You Want To Use Graphs Think About The Approach

Source: Unsplash

The internet, is a big graph full of sites (nodes) where relationships between them are hyperlinks that connect those nodes. Here is an another example. First, imagine all people in the world, yes they can be nodes, in that case, what can be a relationship between them? Well, it can be friendship, family relation, romantic relationship etc.. Do you see now? Graphs can be fun and interesting.

A diagram that exhibits a relationship, often functional, between two sets of numbers as a set of points having coordinates determined by the relationship we call graph. This is a mathematical definition of a graph that they teach us in school. That could be a reason why people consider graph as something boring… Well, “good” formal education! Also, an app that I use for post readability check notices me that that definition is hard to read.

What you should know about Graphs

The size of a graph depends on many basic objects and number of relationships between them. The graph can be directed and undirected. That should be enough knowledge about graphs to continue with this post. Our example graph will be N = { A, B, C, D)

Directed graph

Undirected graph

Graph representation in programming language

We have two approaches, we can use adjacency matrices or adjacency lists. There are different benefits and disadvantages for each of them. A developer is a person that needs to think about what approach he will use to make his application more efficient.

Graph representation using adjacency matrices

We need a grid to be aware of relationships between nodes, so we are able to know if we have relationships or not. For our directed graph above we can use matrix below.

We could have matrix implementation as a double array.

private int[][] adjMatrix;

If we have this data structure then we should not have problems with adding relationships, between nodes. We will just need to put a flag in our matrix (array).

public void addRelationship(int a, int b){ 
adjMatrix[a][b]=1;
}

But, if we want to add a node in our graph what we need to do? We will need to increase the size of our matrix, so that can be tricky.

public void addNode() { 
int n=getNumNode();
if(n>=adjMatrix.length){
int[][] incresedMatrix=new int[n+1][n+1];
for(int i=0;i<adjMatrix.length;i++)
for(int j=0;j<adjMatrix.length;j++)
incresedMatrix[i][j]=adjMatrix[i][j];
adjMatrix=incresedMatrix;
}
for(int i=0;i<adjMatrix[n].length;i++){
adjMatrix[n][i]=0;
}
}

If we want to add a new node in our matrix we will need to create a new matrix and copy all data from the first one to it. How much does it cost to check if there is a relationship between node x and node y with a matrix as the representation? The answer is 1 because 2D arrays have constant-time lookup, so we can check whether the flag is zero or not.

What is the problem with matrix?

If we don’t mind about creating new matrix every time when we need to add a node, that is fine, but, also, there is one other problem. Matrix uses a lot of memory. You need to use memory for every direction between every node, but you could need less than half of that. If we look back for our example where we imagine The Internet as a graph. Every single site on the internet is our node and we need to take a space in memory for every connection (even if they are not connected) between them. What do you think, will we have many rows and columns in the matrix that we can avoid? Yes!! It’s not realistic to think how every site will have hyperlinks for all other sites at the internet.

Benefits of graph represented with adjacency matrices:

  • algebraic representation of graph structure
  • fast to test for edges
  • fast to add/remove relationships

Disadvantages:

  • requires a lot of memory (because of nodes)
  • slow to add/remove nodes

Graph representation using adjacency lists

If we use another approach, adjacency lists, then we need to have linked list structure for our graphs. 
 What we will get with that approach? Well, we want to avoid storing information about relationships that aren’t in the graph. With adjacency lists we will successfully avoid that.

For implementation we can use something like this:

private Map<Integer, ArrayList<Integer>> adjListsMap;

Structure of our graph will be something like this:

A ->{B} 
 B -> {D} 
 C -> {B} 
 D -> {}

With this data structure, we will not have problems with adding relationships between nodes, it’s the same cost like it is for matrix.

public void addRelationship(int a, int b) {         
adjListsMap.get(a).add(b);
}

But, if we want to add a node in our graph will it be better than matrix? Yes, it will be because we don’t need to expand our matrix, we just need to create a new array for our neighbours and add it to our linked list.

public void addNode() { 
int v=getNumNodes();
ArrayList<Integer> neighbours=new ArrayList();
adjListsMap.put(v, neighbours);
}

All our operations in ArrayList have constant cost

Benefits:

  • easy to add vertices
  • easy to add/remove edges
  • may use a lot less memory than adjacency matrices

Disadvantages:

  • may be slow in finding in-degrees

Below you can find two questions with detail comparison answer that can be helpful for you.

Which implementation of a graph makes finding the out-degree more efficient?

In the adjacency matrix representation of graphs, we have to loop over an entire row in the matrix to calculate the number of relationships starting at the node. If we use Big-O notation, this takes O(|N|) time, where N is the number of nodes. However, in the adjacency list representation, we need only call the size function on the list associated with the node. This takes Q(1) time.

Which implementation of a graph makes finding the in-degree more efficient?

In the adjacency matrix representation of graphs, we have to loop over an entire column in the matrix to calculate the number of relationships ending at a given node. If we use Big-O notation, this takes O(|N|) time. In the adjacency list representation, we need to loop over all possible nodes and ask whether the given node appears in the list associated with each one. Since checking for membership in a list takes linear time, the number of tests we make equals the number of relationships in the graph, which is O(|R|), where R is the number of relationships. For graphs without multiple relationships between pairs of nodes, |R| is O(|N|)(squared) so the adjacency matrix representation is faster. For sparse graphs, |R| = O(|N|) so both representations have the same performance.

Coursera is your best friend

The motivation for this blog post I found at Coursera course Advanced Data Structures in Java by the University of California. In this post, you found comparisons that I saw in that course, so not all listed above are my words. I took something from professors in that course. If you are interested in a theme like this one or want to hear more about those two approaches join the course.


Originally published at itworkslocal.ly on January 17, 2016.

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.