Why Should I Give a DAG?

Jim Medlock
Chingu
Published in
8 min readApr 30, 2017

Leveraging Transitive Closure in your Applications

Photo by Jake Ingle via Unsplash.com

Take a Flying Leap

Algorithms are powerful tools that provide software developers with a roadmap for solving specific problems. However, algorithms in classical Mathematics and Computer Science can sometimes be daunting since the rigor required to create and validate them often results in a description that is seemingly both complex and arcane.

Do you understand this?

In spite of this algorithms should not be feared. Developers of all types and experience levels can benefit by becoming familiar with the more common of these and by building on this body of work. For example, every developer should understand the class of algorithms known as data structures — arrays, linked lists, stacks, queues, etc.

Another class of algorithms are those dealing with graphs. In Computer Science this class of algorithms deals with a collection of nodes and the connections between them. Graphs are used to describe varied things including communication networks, parts assemblies (aka bill of materials), roadmaps, and even university course catalogs. One type of graph widely used to represent these types of things is known as a directed acyclic graph (DAG).

The attributes a DAG possesses which sets it apart from other types of graphs are:

  • Directed means that nodes are connected in a specific order with an implied direction.
Figure 1 — A Simple Directed Acyclic Graph (DAG)
  • Acyclic means that once a node has been visited there is no way for it to be revisited when following the path of nodes that come after it. If a cycle or loop were to be present in a DAG it would no longer be a DAG, but more importantly an infinite loop could result when it is traversed.
Figure 2 — A DAG containing a cycle
  • Graph means that nodes are connected and these connection lines are referred to as edges. Edges may also be weighted with a relative cost as compared to other edges. For example, on a roadmap the distance between cities is a type of weight.

“To be Acyclic, or Not to be Acyclic? That is the Question!”

Software that uses DAG’s to represent different types of data, like a university course catalog, are susceptible to infinite loops if a cycle were to creep into the graph. In Figure 2 a cycle exists between node B and node C. Any software traversing this graph would produce the following result:

A → B → C → B → C → B → C … ad nauseum!

This situation could be easily averted by saving the list of nodes that have been visited and looking for a repeated pattern. Unfortunately, such a solution would be over simplistic since loops can be the result of transitive relationships. In other words, dependencies between nodes that are not adjacent.

Figure 3 — A Cycle between non-adjacent nodes

An application traversing this graph would produce the following result:

A → B → C → A → B → C → A→ B → C … and so on.

Now assume that instead of just one node in the transitive relationship there are 8 or 271 or 24913! How would you be able to detect the dependency to prevent a loop?

Warshall to the Rescue!

Photo by Natalie Collins via Unsplash.com

Warshall’s algorithm is a non-recursive method designed to identify transitive closure in a DAG. It is based on the observation that if there is a path from node A to node B and a path from node B to node C then a potentially multi-segment path must also exist between node A and node C. The Floyd-Warshall algorithm is a variation of Warshall’s basic algorithm which makes it possible to find the shortest paths between nodes in a weighted DAG and also makes it possible to identify which nodes are involved in a cycle.

Floyd-Warshall operates against a square matrix whose rows and columns represent the nodes in the graph. Before executing the algorithm this matrix is populated with the dependencies between nodes. For example, if node 1 is dependent on node 2 then the cell at row 1, column 2 will be set to contain the number 1.

After the algorithm is applied to the dependency matrix, cells along the diagonal will contain a zero if the corresponding node doesn’t participate in a cycle. If the node is part of a cycle this cell will contain1. Traversing the diagonal looking for non-zero values will identify all nodes in a cycle.

Let’s start with an implementation of this algorithm in Javascript. The following class definition (warshall.js) implements a modified version of the Floyd-Warshall algorithm in its identifyDependencies() function. The difference between what is implemented and the original algorithm is the path weights are assumed to be equal so they are all set to 1.

Testing this against the case of the graph containing no cycles (see Figure 1) produces the following results.

Dependency Array - Test #1 after adding dependencies...
[ 0, 0, 0 ]
[ 1, 0, 0 ]
[ 0, 1, 0 ]
Dependency Array - Test #1 after identifying cycles...
[ 0, 0, 0 ]
[ 1, 0, 0 ]
[ 1, 1, 0 ]

In this test the first output shows that node 1 is waiting on node 0 and node 2 is waiting on node 1. After applying Floyd-Warshall to the dependency array all cells along the diagonal of the array are zero indicating that no cycles exist.

The second test is for the case where a cycle exists between two adjacent nodes (see Figure 2).

Dependency Array - Test #2 after adding dependencies...
[ 0, 1, 0 ]
[ 1, 0, 0 ]
[ 0, 0, 0 ]
Dependency Array - Test #2 after identifying cycles...
[ 1, 1, 0 ]
[ 1, 1, 0 ]
[ 0, 0, 0 ]

In this test node 0 is waiting on node 1 and node 1 is waiting on node 0. Following the execution of the algorithm cells [0][0]and [1][1] both contain a 1 indicating that both nodes are involved in a loop.

Finally, the last test is that of the case where a cycle exists between two nodes that aren’t adjacent.

Dependency Array - Test #3 after adding dependencies...
[ 0, 0, 1 ]
[ 1, 0, 0 ]
[ 0, 1, 0 ]
Dependency Array - Test #3 after identifying cycles...
[ 1, 1, 1 ]
[ 1, 1, 1 ]
[ 1, 1, 1 ]

In this case node 2is waiting on node 0, node 1 is waiting on node 0 and node 2 is waiting on node 1. The algorithm detects this cycle and indicates it by setting each cell in the diagonal of the dependency array to 1.

Real World Considerations

Knowing which nodes are part of a cycle is an important fact to know, but this is highly dependent on the time the application to chooses to check for the existence of cycles. There are two general strategies applications may use to check for cycles.

  1. Immediate cycle detection — Look for cycles whenever a node is added or moved to determine if the result would be a cycle.
  2. Deferred cycle detection — Periodically look for cycles. This means that cycles would be allowed whenever nodes are added or moved, but they would be resolved later.

While immediate cycle detection is simpler to implement it does impose a performance penalty based on the number of nodes in the graph and frequency of changes to the graph. On the other hand, deferring cycle detection reduces performance overhead, but at the cost of allowing cycles to be created which increases application complexity by requiring a periodic of check for cycles and resolution of any that are found.

The performance impact of deferring detection is also highly application-dependent. Consider for example an application that uses a graph to manage access to some set of resources. Allowing a cycle in this type of application could result in a pause in execution since in this case a cycle represents a deadlock condition. This complicates cycle detection by requiring that it execute asynchronously in its own thread with the rest of the application. Depending on the application the impact of allowing a cycle could also be greater than any performance impact resulting from immediate cycle detection.

Another important consideration related to deferred detection stems from the fact Floyd-Warshall identifies nodes that are in a cycle, but not the node that caused the cycle nor the number of cycles. It is possible that between executions of the detection process more than one cycle could have been created. One strategy that can be used to resolve this is to use some other data, such as a node creation timestamp, to remove the most recently created node and then repeating the detection process until there are no more cycles. This can be relatively inexpensive, but it is imperfect since nodes created after the one that caused the cycle will be removed first.

As with many problems facing developers there is no clear answer. Arriving at the optimal solution requires a deep understanding of the problem, the application environment, and the algorithms and process being used.

Takeaways

I hope you’ve learned something new from this article and something you can apply to your work. But taking a broader view I’d like to leave you with the following key points:

  1. Learning algorithms is an investment that you can benefit from throughout your career as a developer.
  2. Applications of algorithms aren’t necessarily obvious. Finding the right marriage of algorithm to a problem requires both an understanding of the algorithm and a thorough understanding of the problem.
  3. With respect to algorithms:

Poor programmers re-invent the wheel. Often arriving at subpar solutions to problems for which more elegant solutions have been already created.

Good developers translate existing algorithms to code.

Above average developers understand algorithms well enough to apply them to a broad range of problems.

Great developers modify algorithms to address the requirements of the specific problems they are trying to solve.

The source code for the examples used in this article can be found on GitHub.

I hope you find this information useful and I look forward to any questions and comments you might have. Have a good day, afternoon, or evening and go do great things!

What is Chingu?

Come join us at Chingu.io to get out of “Tutorial Purgatory” by joining one of our remote Voyage project teams to build apps, refine “hard” skills, and learn new “soft” skills. We help Developers bridge the gap between what they’ve learned and the skills employers are looking for.

--

--

Jim Medlock
Chingu
Editor for

I manage Operations at Chingu, Inc. We help bridge what you have learned to get the experience employers seek. Come join us — Https://chingu.io