Data Abstraction and the Grivan-Newman Algorithm

Jurech
smucs
Published in
4 min readApr 12, 2022

James Urech is a Sophomore computer science major at Southern Methodist University. His interests include game design, music composition, and creative writing.

Abstraction in computer science refers to the simplification of the terms of a problem. While programmers most often use this to refer to the levels of abstraction in programming languages (C is a level of abstraction above assembly, for example), this concept also applies to data structures. When specifically applied to graphs, this process involves the deterioration of graphs to their bare components: vertices and edges.

The first simplification of the structure of a graph is the adjacency list, which reformats a graph as a list of nodes, each listed beside all the nodes they are adjacent to. While this format decreases readability for humans, it provides a useful tool for manipulating the data with computers.

A simple graph represented by an adjacency list.

A large issue for the human readability of a visualized adjacency list comes from the duplication of nodes in large graphs and the difficulties in finding direct paths. When dealing with adjacency lists as a data structure, these lists become highly useful lists of pointers, allowing for easy traversal through the graph for many purposes, including recursive/iterative backtracking and breadth/depth-first searches.

Even further deconstruction of a graph reveals its two most basic components: lists of vertices and lists of edges. Understanding these basic principles allows for the implementation of many algorithms concerning graphs, including the Girvan-Newman Algorithm.

Due to the complex nature of graphs as data structures, as well as the many forms they can take, code libraries have been developed to share a useable code base, keeping most coders from having to implement these structures themselves; however, these can come with drawbacks. Some libraries, such as the Boost Graph Library, seek to be as generic as possible, but this results in a highly convoluted system of templates and pointers that can be overwhelming even for seasoned programmers.

Abstracting data from complex code libraries into more comfortable forms reduces the stress of learning a completely new system in a short amount of time. While it may sacrifice efficiency, abstraction creates a more workable environment. For example, in my personal implementation of the Girvan-Newman algorithm, I sought to use as little of the Boost Graph Library as was feasible. I found snippets of pre-written code online and used those to help me extract the component features of a graph into vectors. Using those vectors, I performed all the necessary comparison operations to determine betweenness, and the only time I altered the graph object itself was when I removed edges.

std::queue<int> Q;
std::vector<int> visited;
std::vector<std::vector<int>> paths;
while (!Q.empty()){
for (vp = vertices(g); vp.first != vp.second; ++vp.first){
if (index[*vp.first] == Q.front()){
Q.pop();
for (tie(ai, ai_end) = adjacent_vertices(*vp.first, g); ai != ai_end; ++ai){
bool vectorContains = contains(visited, index[*ai]);
if (!vectorContains){
Q.push(index[*ai]);
visited.push_back(index[*ai]);
paths.push_back(currentPath);
}
}
}
}
}

Instead of acting on the vertices themselves, this code represents them as vectors of integers. It searches through the list of vertices on the graph until it finds a match, as boost seemingly does not have its own search function. After using boost to find all the vertices adjacent to the current vertex, it adds integers to the queue and list of visited vertices instead of the vertices themselves or applying a “visited” property to them. As part of a breadth-first search to find betweenness, it also creates lists of numbers as “paths” taken to each vector.

All of this data is stored separately from the graph, which makes processes such as parsing edge usage from the list of paths require only simple vector operations:

for(int i = 0; i < edges.size(); i++){
for(int j = 0; j < paths.size(); j++){
for(int k = 0; k < (paths[j].size() - 1); k++){
if(/*path[j] and path[j + 1] match edges[i]*/){
betweenness[i]++;
}
}
}
}

In this case, betweenness is a parallel vector to a vector containing the list of edges, meaning that whatever memory location i points to in edges, it will always match the correct location in betweenness.

Data abstraction may reduce computational efficiency, but it greatly increases accessibility for complex libraries and data structures.

--

--