# A faster processor or a faster algorithm ?

I have completed the first set of lectures on dynamic connectivity problem and found them very interesting and informative. So, the basic concept of dynamic connectivity problem is as follows:

- Given N nodes, two nodes can be connected to each other. { union(node1, node2) }
- Query can be to check if any of the two nodes among all nodes are connected or not. { connected(node1, node2) }

Robert Sedgewick, the instructor for the course introduces three algorithms, quick-find, quick-union and the weighted-quick-union. In the assignment related to this topic, students are supposed to implement a percolation model (which is an application of the dynamic connectivity problem) using the weighted-quick-union algorithm. Just for fun I decided to analyze the actual running times (CPU time) of these three algorithms with the increase in the value of N. I think this will help readers understand why is it important to implement better, faster algorithms. Before the analysis I’d introduce you to the time complexities of these algorithms. Explanation of the underlying working of the algorithms is unnecessary for getting my point across.

The quick-find algorithm takes time proportional to N, N and 1 for initialization of the underlying data-structure, union (connecting two nodes) and finding (to check if two nodes are connected or not) respectively. The algorithm does the best among all tree algorithms for find, but that is not enough. The algorithm has to do well for union as well.

Quick-union takes time proportional to N,N,N for initialization, union and finding respectively in the worse case. An improvement to the quick-union algorithm is a weighted-quick-union algorithm which takes time proportional to N, log N, log N for initialization, union and find respectively. Further some more improvements were suggested by the instructor, like shortening the tree length so that the search for the root of a node is reduced.

I am partially done with the first assignment which is based on the concept of a percolation model. The assignment is about finding whether a grid of n by n cells percolates or not. If the top layer is connected to the bottom most layer then the system percolates. Initially all the sites/cells are closed, they are opened one by one randomly. If a site opens then the opened cell has to connect to all the already opened neighbouring cells if any. Generation of random sites continues till system percolates. The following table shows the N vs Running time for all the three algorithms.

The quick-find algorithm grows quickly with increase in N. For N = 500, the runtime is in seconds, which is very slow as compared to the other two algorithms. So, what is the reason for quick-find being so slow even though find takes constant amount of runtime? The only reason is that union is proportional to N. In the percolation problem, for very large values of N like 1000, union will be called 600 times (because percolation threshold is around 0.59 for any value of N, i.e. if 59–60% sites are opened the system percolates). If union gets called 600 times, and runs with complexity N(1000), then 600 * 1000 is going to be the approximate computation time, which is very large.

Why is quick-union better?

Quick union takes time proportional to N for both union and find. But this is in the worst case. Weighted quick union is a modification of quick union wherein an attempt is made to keep the size of the underlying tree data structure as small as possible. In quick union the data structure that abstractly represents the components (connected nodes) is a tree (actually its an array, but the interpretation is that of a tree, wherein each tree represents a group of connected nodes).

You can imagine that a problem is represented as multiple trees, with each tree representing connected components. Nodes in the same tree are part of the same component and are connected. Now, if two nodes form different components are to be connected, then two trees have to be joined. When joining two trees one (the algorithm) needs to be smart. Weighted quick union is smart whereas quick-union is dumb!

As shown in the above figure 1, 2, 3 and 8 are nodes in the dynamic connectivity problem. 1,2 and 3 are part of the same tree because they are connected whereas 8 is alone. Now if the user enters a command, union(2,8) then we have two options to merge two trees(8 is also a tree with a single node). Option B is smarter as the length of the tree stays small after the union and is chosen by the weighted-quick-union algorithm whereas quick-union chooses any of the options without considering the length of the tree. This consideration makes weighted-quick-union faster and smarter. And the length of the tree can never be greater than log N. But why?

Assume that two trees T1 and T2 are merged, |T2| ≥ |T1| and x be a node in T1. The length of the tree T1 increases at most by one in each merger. How many such mergers are possible?

Such mergers are possible log N times. The reason is that the size of the tree T1 at least doubles with each merger and if log N such mergers are done then the size of the tree becomes equal to N.

Assume the number of nodes be 32(N). Initially two individual nodes are merged. Size becomes 2, which can be followed by the following sequence of events.

*1*2 = 2**2*2 = 4*- 4*2 = 8
- 8 * 2 = 16
- 16 * 2 = 32

Only 5 mergers have formed one tree of size 32. We can’t go further as the total number of nodes are 32. As only 5 mergers are possible and the length of the tree increases by one with each merger, the max length of the tree is 5 which is log 32.

Now, coming back to the topic. We can compare two algorithms one with log N complexity and the other with N complexity and see how they grow.

The growth for Quick union is very steep whereas weighted quick union grows slower. The running times for both are almost same till N≤ 500. Significant difference is seen for N >1000. This is the case for N versus log N. Imagine what would happen if log N was competing with square(N).