Getting a Graph Representation of a Pipeline in Apache Beam

Rustam Mehmandarov
Compendium

--

Intro

Constructing advanced pipelines, or trying to wrap your head around the existing pipelines, in Apache Beam can sometimes be challenging. We have seen some nice visual representations of the pipelines in the managed Cloud versions of this software, but figuring out how to get a graph representation of the pipeline required a little bit of research. Here is how it is done in a few steps using Beam’s Java SDK.

TL;DR: Getting Graph Representation

If you just want to see a few lines that let you generate the DOT representation of the graph, here it is:

Now, if you want a slightly more comprehensive example, keep on reading.

A Full Example

Here we will be using word count example, particularly the MinimalWordCount class.

Adding Maven Dependency

First, we need to add a dependency to the Maven file under <dependencies> section:

The Code

Now, we will need to add a few imports (assuming you already added the Maven dependency mentioned earlier).

To get the DOT representation of the pipeline graph we will be passing the pipeline object to the PipelineDotRenderer class, and in this example, we are only logging the output to the console (hence the log4j imports).

That’s it. To see the code in action, run it from the command line:

$ mvn compile exec:java \
-Dexec.mainClass=org.apache.beam.examples.MinimalWordCount \
-Pdirect-runner

This code will produce a DOT representation of the pipeline and log it to the console.

A Complete Example

A fully working example can be found in my repository, based on MinimalWordCount code. There, in addition to logging to the console, we will be storing the DOT representation to a file.

In the next section, we will have a brief look at what can be done with the DOT representations.

What Now?

Now that we have a DOT representation of the pipeline graph, we can use it to get a better understanding of the pipeline. For instance, you can generate an SVG or a PNG image from the data. Note that the generated graph might be a bit verbose, but gives a good overview of the pipeline graph.

Here, I have also included examples of the DOT graph and the PNG file generated for that particular pipeline.

Assuming that you have Graphviz tools installed, you can convert a DOT file to a PNG image using this command:

$ dot -Tpng -o pipeline_graph.png pipeline_graph.dot

In addition to Grapgviz (Wikipedia link), there are also online services for converting DOT graphs to graphical representations, like this one.

A part of a graphical representation for the pipeline in the MinimalWordCount example.

Originally published at https://mehmandarov.com on November 27, 2019.

--

--

Rustam Mehmandarov
Compendium

Passionate computer scientist. Java Champion and Google Developers Expert (GDE) for Cloud. Public speaker. Community builder.