Practical application of Directed Acyclic Graph (DAG)

Published in

Active Developement

7 min readSep 27, 2023

Graph Theory is not just cool; it is a game-changer.
It can be used to simplify complex problems in many fields, HECK it can even be used to prioritize to-dos or mapping out a vacation. Yet, for something so powerful, it is often overlooked when seeking solutions.
This realization struck me hard while diving into a PHP (Laravel) project I was working on. I spent a lot of time trying to come up with a solution to a problem, only to find that the answer that was there all along “Graph Theory”.

The Problem

In our geolocation app, our mission was simple: present data like population metrics, points of sales, and more on a map. The app’s primary role was crafting reports about regions based on various KPIs. But here is the catch: going through our immense data meant that generating these reports was a very slow process, and naturally, optimization was always on our radar.

To optimize the report generation time, we created a database table that represents the hierarchical relationships between our map levels — countries, cities, districts… . This was to avoid recalculating these relationships every single time. But creating this table itself took a long time, and any new data necessitated recalculations of parts of this table or all of it.

The goal now is to optimize the calculation of relationships.

Example

Here is a simplified example of the levels hierarchy that we have

Levels Heirarchy diagram — Levels Hierarchy

The current calculations

In the hierarchy that we see on the “Levels Hierarchy” diagram we can see that what we need to calculate is wether a level is containing another level or not.
The simple (unoptimized) way is to do the calculation for every level with every level that it might include.

Example

We have a list showing which regions are included within other regions. 
For instance, "A" includes regions "B,C,D,E,F,G".
    'A' => ['B','C','D','E','F','G']
    'B' => ['C','D','E','F','G']
    'C' => ['E','F','G']
    'D' => ['E','F','G']
    'E' => ['F','G']
    'F' => ['G']
    'G' => []

The way it works:

Start with a region (e.g., “A”).
Look at its direct sub-regions (e.g., for “A” these are “B,C,D,E,F,G”).
For each sub-region:

Calculate if the region (e.g., “A”) includes the sub-region (e.g., “B”)

4. Do this for all the regions

The brainstorming

From our observation, the current approach is not the most efficient. We noticed that instead of calculating the inclusion for every level, we can deduce some of the inclusions. Here is an illustrative example:

Given:
A includes B
B includes C

We can deduce that:
A includes C, through B

Such deduction is possible only due to the hierarchical nature of our inclusions. However, it is vital to maintain this hierarchy — calculations must be done in sequence for this method to work.

The challenge now lies in finding which relationships need to be explicitly calculated and which can be deduced.

Visualizing the Data as a Graph

To better represent the hierarchical inclusions, I decided to visually represent our data. By plotting the relationships as nodes and edges on a graph, patterns and connections became much more apparent. Each region (like “A”, “B”, or “C”) was represented as a node, and their inclusions as directed edges connecting them.

As I mapped out these relationships, I saw that our structure resembled a Directed Acyclic Graph (DAG). In a DAG, directions matter (from parent to child) and there are no circular relationships. This was perfect! Since our regions had a clear hierarchy and no region could include another in a circular manner (like “A includes B” and “B includes A”), the DAG model was an ideal fit. And it opens the door to using various algorithms specifically designed for such graphs.

Now that we know our data can be represented as a DAG what can we do with it ?

Transitive Reduction Algorithm

At its core, the transitive reduction of a directed graph is a way to produce a graph with the fewest possible edges that still maintains the same reachability relationship. In other terms, it is like trimming the excess links without losing the path from one node to another.

For our problem, applying the transitive reduction would mean eliminating redundant inclusion calculations. If “A” includes “B”, and “B” includes “C”, we do not need to directly state that “A” includes “C” — it is implicitly understood.
By using the transitive reduction algorithm, we can determine the minimum relationships that need to be explicitly calculated, making our process not only more efficient but also much clearer.

After applying the Transitive Reduction Algorithm to our graph we are left with the minimum number of relations that we have to calculate !

Using our example, The Relationships Would be:
A includes B
B includes C
B includes D
C includes E
D includes E
E includes F
F includes G

With the minimum relationships in hand, our next challenge is to deduce the remainder of the inclusions without unnecessary calculations. Instead of iterating through each possible combination, which would be computationally expensive, we turn to another powerful graph algorithm: Dijkstra’s shortest path algorithm.

Shortest path algorithm

The shortest path algorithm, as its name suggests, finds the shortest route between two nodes in a graph. By applying this to our DAG, we can easily uncover relationships such as “A includes C through B”. In essence, we are looking for the most direct path of inclusion between regions.

For instance, to deduce the relationship between “A” and “E”, the full path would be “A includes E through B and then through C”. However, for our optimization purposes, we will simplify it to “A includes E through B”. This approach significantly cuts down on the computational work, as we are not re-calculating relationships we can simply derive.

Using our example, The Relationships Would be:
A includes C through B
A includes D through B
A includes E through B
A includes F through B
A includes G through B
B includes E through C
B includes F through C
B includes G through C
C includes F through E
C includes G through E
D includes F through E
D includes G through E
E includes G through F

While it might seem that we have found a direct way to deduce relationships, there is a sequencing challenge. If we were to start with deducing “A includes E through B” before we have established that “B includes E through C”, our deductions could end up being inaccurate or incomplete. The order in which we tackle these relationships is crucial. Enter Topological Sorting !

Topological Sorting

To ensure we deduce relationships in the correct sequence, we use topological sorting. Topological order is a linear ordering of vertices in a directed graph such that for every directed edge U → V, vertex U comes before V in the ordering. By calculating the topological order of our graph, we get a blueprint that guides us on which relations to establish first.

With the topological order in hand, we can arrange our relationships in the correct sequence. This systematic approach guarantees accurate deductions.

So, for our DAG, we would first establish relationships involving “B” before deducing those involving “A”. And to do this we just need to sort our relations in reverse of the topological sort.
This ensures that by the time we deduce the relationship “A includes E through B”, we have already determined that “B includes E through C”.

Using our example, The Relationships Would be:
E includes G through F
D includes G through E
D includes F through E
C includes G through E
C includes F through E
B includes G through C
B includes F through C
B includes E through C
A includes G through B
A includes F through B
A includes E through B
A includes D through B
A includes C through B

In Praise of Graph Theory

The beauty and strength of graph theory are truly remarkable. What started as a task to optimize a geolocation app’s functionality transformed into an insightful journey into the heart of directed graphs and their algorithms. Our problem, which seemed so daunting and complex at first, found its solution in the well-defined structures and methodologies of graph theory.

Diving into Directed Acyclic Graphs (DAGs) provided clarity. By applying the transitive reduction and shortest path algorithms, we found a straightforward solution. Topological sorting was key, allowing us to deduce relationships in a logical order and maintain the accuracy of our data.

The impact was substantial. What used to take us days in calculating the hierarchy table was now achieved in just a few hours, showcasing the efficiency of combining practical programming with theoretical insights.

Final Thoughts

Graph theory might seem abstract on paper, but its practical impact in our project was undeniable. The time savings we achieved underscore the value of these theories in real-world scenarios. It is a reminder to always think outside the box. With our geolocation app significantly improved, it is clear that understanding and applying concepts like graph theory can make a real difference in the outcomes of our projects in any field.

Practical application of Directed Acyclic Graph (DAG)

The Problem

Example

The current calculations

Example

The brainstorming

Visualizing the Data as a Graph

Transitive Reduction Algorithm

Shortest path algorithm

Topological Sorting

In Praise of Graph Theory

Final Thoughts

Written by Houssem Guemer