Introduction to Graphs for CP (Part II, implementation & algorithms)

Published in

Programming Club, IIT Kanpur

7 min readMar 17, 2021

In the previous blog, we learnt about graph theory and the various features and characteristics of graphs. The time has begun for us to program these graphs!

Pre-requisites

Part I of the blog
Knowledge of C++ and its features like STL

If you have learnt C but don’t know much about C++, then:

Learn C++! It’s pretty powerful for CP (mainly due to STL) and pretty similar to C, so it wouldn’t take that much effort. Speedrun through this video by freeCodeCamp.
Learn STL. This is a good article to begin with after completing step 1.

How to store a graph?

In most problems, the input is usually given in the form of the edges, where we input the starting and ending vertex (no difference between these two for undirected graphs) of that particular edge, for all edges. Suppose we are given a graph G of n vertices (say, numbered 1 to n) and m edges. A question naturally arises: how do we even begin to store our graph? Mainly, there are 2 ways to do this:

Adjacency Matrix

This form of representation uses a square matrix of size n x n, all 0s initially. If there is an edge from vertex i to vertex j, we set adj[i][j] = 1. But in the case of undirected graphs, we know that a connection from i to j implies a connection from j to i as well. As we noted earlier, the undirected graph can be denoted by a directed graph with edges in both ways. So, if the graph is undirected, we also set adj[j][i] = 1.

int n, m;
cin >> n >> m;
vector<vector<int>> adj(n, vector<int>(n, 0)); // n x n matrix of 0s
for(int i=0; i<m; i++) {
  int x, y;
  cin >> x >> y;
  x--; y--;  // converting to 0-based indexing
  adj[x][y] = 1;
  adj[y][x] = 1;  // if undirected graph
}

However, a few problems arise with this representation. First, we need O(n²) time and memory to construct the graph. A lot of memory might be wasted since many entries can turn out to be 0 in your graph. Secondly, as we will see ahead, our algorithms will work slower (for example, O(n²) instead of O(n+m)). For large enough values of n (like 10⁵), our programs will easily time out. There is an alternate way that addresses these issues to some extent.

Adjacency List

In this form of representation, we form n different lists corresponding to each vertex, and the list of any vertex v (let’s call it adj[v]) contains the vertices adjacent to v. On receiving an edge (x, y), we push y into adj[x]. Again, as we reasoned above, we also push x into adj[y] if the graph is undirected.

int n, m;
cin >> n >> m;
vector<vector<int>> adj(n);  // initially n empty lists
for(int i=0; i<m; i++) {
  int x, y;
  cin >> x >> y;
  x--; y--;  // converting to 0-based indexing
  adj[x].push_back(y);
  adj[y].push_back(x);  // if graph is undirected
}

The memory and time used here is O(n+m).

Now that we know how to store our graph, it’s time to tackle various problems!

1. Can I reach v from u?

One of the most basic questions we can ask regarding our graph G is: what are the various vertices reachable from given vertex u? Essentially, we wish to find out the member vertices of the connected component of G containing u. We now present 2 basic algorithms:

Depth-First Search (DFS)

Starting from u, we mark the current vertex as visited, and as soon as we find a neighbouring vertex that has not been visited yet, we visit it and apply this process again. We repeat until there are no more vertices left to visit. There is a beautiful recursive implementation of this idea (using the adjacency list representation):

// g is the adjacency list representation of the graph
// vis[x] denotes whether vertex x has been visited or not, initially all falsevoid dfs(int u) {
  vis[u] = true;         // marking u as visited
  for(int v: g[u]) {     // iterating over vertices adjacent to u
    if(!vis[v]) dfs(v);
  }
}int main() {
  // prepare g and vis  dfs(u);  // Now vis[x] denotes if x is reachable from u
  return 0;
}

If u and v lie in different connected components, then we can never reach v, as we can only jump from neighbours to neighbours. And if they do lie in the same component, then we will reach v eventually.

Time complexity: O(n+m) if we use the adjacency list, O(n²) if we use the adjacency matrix.

Breadth-First Search (BFS)

Starting from u, when we find a non-visited neighbour, we don’t immediately go to it, but we push it into a queue, so we push all non-visited neighbours into the queue (and immediately mark them visited as well). We then choose the next element of the queue to go into. On repeating the procedure, it can be intuitively seen that each neighbour is processed one-by-one, and only then we go to the next level/depth of neighbours. We can also add an array dis which denotes the “level” reached till now. Finally, dis[v] will give the length of the shortest path from u to v (assuming the length of each edge to be 1) if v is reachable.

The code:

int main() {
  // prepare g, vis  vector<int> dis(n, -1);
  queue<int> q;
  q.push(start);  // start is the starting vertex
  dis[start] = 0;
  vis[start] = true;
  while(!q.empty()) {
    int u = q.front();
    q.pop();
    for(int v: g[u]) if(!vis[v]) {
      vis[v] = true;
      dis[v] = dis[u] + 1;
      q.push(v);
    }
  }  // Now vis[x] denotes if x is reachable from u
  // and dis[x] denotes shortest distance of x from u
  // if x is reachable from u
  // Otherwise it is -1
  return 0;
}

Time complexity: O(n+m) if using the adjacency list, O(n²) for the adjacency matrix version.

2. Number of connected components

Now that we know DFS and BFS, it is not very difficult to find the number of connected components. We know doing dfs(u) or performing BFS starting with u will mark the whole connected component having u as visited, and we make use of this fact.

// prepare g, visint c = 0;
for(int u=0; u<n; u++) if(!vis[u]) {
  dfs(u);  // can do BFS as well
  c++;
}// c now has the value of the number of connected components of g

Time complexity: O(n+m)

Note that we can also modify this (and DFS) further to get the individual connected components as well.

void dfs(int u) {
  vis[u] = true;
  comp.push_back(u);
  for(int v: g[u]) if(!vis[v]) dfs(v);
}int main() {
  // prepare g, vis, comp
  // comp initially empty
  
  for(int u=0; u<n; u++) if(!vis[u]) {
    dfs(u);
    // comp now has the connected component containing u
    // can get information from it
    comp.clear();
  }  return 0;
}

Modified DFS for trees

Yes, trees are connected graphs, so the final array vis will be completely filled with trues. But DFS can be used for other purposes as well. Let us see an alternate “skeleton” of using DFS on trees.

Suppose we wish to do DFS from any vertex u of a tree. Let us consider the rooted version of the tree where the root is u.

First, it goes to the neighbours of u, and in turn, each of them is processed individually. Perhaps, it can be a bit intuitive to see that until that particular subtree is processed completely, the processing does not touch the vertices outside the subtree. After that, the subtree corresponding to the next neighbour is processed, and so on. When all children of u have been processed, it backtracks to u as expected.

Due to this, we don’t need to maintain a separate array vis to keep track of the visited vertices. We know that among the neighbours of the current vertex, it’s children have not been visited yet, and the parent (if it exists) must have already been visited. Hence, we can come up with the following code:

void dfs(int u, int par = -1) {
  for(int v: g[u]) {
    // all vertices except the parent are unvisited
    if(v != par) dfs(v, u);
  }
}int main() {
  // prepare g
  dfs(root);
  return 0;
}

Time complexity: O(n) (remember m = n-1)

Note that this code doesn't actually do much, but will be used as a “skeleton” for some applications.

3. Finding sizes of subtrees rooted at vertices, 4. Finding parent of every vertex

Code based on the skeleton:

// g is the graph
// s is the array storing size of subtree of each vertex
// p is the array storing the parent of each vertexint dfs(int u, int par=-1) {
  p[u] = par;
  // size is 1 + sum of sizes of all children subtrees
  s[u] = 1;
  for(int v: g[u]) if(v != par) s[u] += dfs(v, u);
  return s[u];
}int main() {
  // prepare g, s, p

  dfs(root);  // parent of root doesn't exist and is stored as -1
  return 0;
}

Time complexity: O(n)

Some problems for further practice