Building a Minimap for CDAP Pipeline

Edwin Elia
cdapio
Published in
4 min readJul 15, 2019

I can recall that one of the most popular sentiments while going through math classes in college was “I will never use some of these concepts again”. I echoed this sentiment too, until recently, when, while working on a project for CDAP pipelines, I got a chance to apply some of these concepts.

Imagine you are given a thick book and a couple loose pages from the book. There is no chapter reference, no page number, and no table of contents. How do you find the proper placement for these loose pages in the book? You have to painstakingly scan through the book and see if the sentences would match up.

The same problem can manifest itself in CDAP pipelines as well. CDAP pipelines provide a graphical, code-free interface to process, transform, map, blend, join or analyze data. While building a large pipeline, sometimes it can become quite confusing to know where you are in the pipeline. You lose the context of whether you are in the beginning, middle, or end of the pipeline. The view is quite limited. It is also difficult to jump to different parts of the pipeline, since you cannot view the entire pipeline at once.

A minimap can summarize a pipeline as well as give context of where you are in the pipeline. It acts as a table of contents for a book.

Complexities

Before we start, let us define some terms:

In principle, creating a minimap is quite simple: get a scale for size of pipeline graph versus the size of the minimap, then simply multiple all the coordinates of the nodes with the scale. However in the case of CDAP pipeline, we cannot restrict the size of the pipeline.

The container is responsible for managing the zoom and also the panning of the graph (when user drags on the pipeline canvas, they want to move the entire graph instead of individual nodes). The container does not restrict overflow, therefore the nodes can exist beyond the container.

For this reason, we needed to rely on the actual positioning of the nodes to determine the size of the graph. We achieved this by maintaining the minimum and maximum position of the nodes (furthest nodes in the graph). Once we have the information of the size of the graph, then we can calculate the scale based on the size of the desired minimap, and offset the position of the nodes by the minimum so that all the nodes starts at zero.

Now that the minimap and the nodes are rendered, the next thing we needed to render is the viewport indicator. The way the zoom works, it adjusts the size from the center of the container instead of the viewport. Therefore, even though the panning of the container is set at zero and zoom scale is not one, there will be some distance between the viewport and the container.

Therefore, to calculate the position of the viewport for the minimap, you have to take into consideration the container panning minus the offset adjustments from the zoom level and multiply with the scale for the minimap. At a high level, the equation looks like the following:

(1 — Scale) * Container panning / 2 * Minimap scale

Learnings

The first learning I got through working on this project is to define the terms clearly right in the beginning. When there are many components involved, it is important to refer to them in a consistent matter. Secondly, since some of the interactions required more involved math, it was important to figure out the equations before starting to code. This helped in making critical design decisions such as determining the anchor points for each of the components for positioning.

Conclusion

We want to make continuous improvements to improve the user experience for building and managing complex pipelines. Stay tuned for even more such improvements, and try out CDAP today!

--

--

Edwin Elia
cdapio
Writer for

Edwin is a Senior Software Engineer for Netflix, previously at Google Cloud. He specializes in Data Analytics User Interface.