GETTING STARTED | SUNBURST & PARALLEL COORDINATES PLOT | KNIME ANALYTICS PLATFORM

Such Great Heights

Exploring Waterfalls With Sunburst And Parallel Coordinates Plots In KNIME Analytics Platform

John Denham
Low Code for Data Science

--

Introduction: Where the River Goes

Waterfalls are powerfully inspiring. They are monolithic natural wonders, driven by the forces of gravity, carving their way through stone over lengths of time that most cannot truly comprehend. They are visually arresting and audibly enveloping, yet they can be soothing, peaceful, restorative and ground us in ways deeper than language can describe.

Much like how rivers and waterfalls help us understand the history of the planet, data visualization can go a long way to tell the story of our data or help the analyst suss out insights to share.

This article is not about waterfall charts but rather about the Sunburst Chart and the Parallel Coordinates Plot nodes in KNIME Analytics Platform. These charts are unique in that they organize a lot of information in a relatively small space. The sunburst chart allows us to dig into hierarchical data with ease and at a glance get an idea of how our data is composed. The parallel coordinates plot, while more of an analyst chart than a consumer one, allows us to compare many metrics or observations all at once. While both visualizations may seem potentially overwhelming, interactivity is the key here and what really makes them shine.

For this article, I scraped the datasets from two Wikipedia tables about waterfalls, here and here. As such, while I cannot attest to the accuracy of this data, you should have a good understanding of the visualizations after reading this article. The datasets include waterfalls by country name, locality, heights, widths, drop and flow rate. The sunburst chart dataset includes the tallest natural waterfalls, and the parallel coordinates plot dataset includes existing waterfalls with an average flow rate of at least 150 cubic meters per second.

The datasets come with the reference workflow for this blog post, named Sunburst and Parallel Coords, which is available on the KNIME Hub and can be downloaded for free.

In this post, we will describe and apply these nodes to:

  • Create and interact with sunburst charts and parallel coordinate plots with Wikipedia waterfall data.
  • Apply color to our data with the Color Manager node.
  • Discuss/control aspects of the nodes with CSS and flow variables.

The Nodes

Figure 1: Sunburst Chart And Parallel Coordinates Plot Nodes.

The Sunburst Chart and Parallel Coordinates Plot nodes are a part of of the KNIME JavaScript Views family of nodes that have JavaScript-based implementation and are organized under the Views category in the node repository (Figure 1, 2).

Figure 2: JavaScript Views Nodes.

As touched on earlier, the sunburst chart is great at displaying hierarchical information in a condensed and easy to interact with way. On the other hand, the parallel coordinates plot can be used to visualize a vast amount of data all at once, enabling the detection of large patterns but also allowing for specific drill-downs.

Step 1: The Sunburst Chart

Figure 3: Workflow Step 1.

Step 1 of the workflow is split into a meters option (top) and a feet option (bottom). For layout efficiency, the CSS Editor node has been placed at the front of the workflow (Figure 3).

For our sunburst chart, we want to click through the waterfall hierarchy from country, to locality, and finally to the waterfall itself. We also want the height of the waterfall displayed when we reach its level in the hierarchy.

After reading in the file, we perform some basic data preparation. The key step here is the use of the Column Resorter node to put the columns in the order of the desired hierarchy. If we don’t do this, the sunburst chart will plot in the wrong order (Figure 4).

Figure 4: Column Resorter Node.

Next, we can jump into the configuration of the Sunburst Chart node (Figure 5).

Figure 5: Sunburst Chart Node Configuration Options Tab.

On the Options tab, as with other JavaScript-enabled visualization nodes, we can choose to have an image generated that becomes available through the green image output port of the node.

For display considerations, we can also set a maximum number of rows.

The twinlist of Exclude and Include columns is where we pass in our hierarchy data. As you can see, these columns are in the logical order from Country -the largest entity- to Waterfall -the smallest.

With these on the Include side, we just select our value column.

Note. It’s important to note that items in hierarchies or paths shorter than the max depth of the entire hierarchy need to be filled with missing values.

If we want to filter parts of our sunburst chart based on the relative sizes of each node to the diameter of the circle itself, we can select Filter out small nodes. With the latter configuration checked, the larger the number entered in Threshold for filtering (radians), the less data will be displayed. In other words, the larger the threshold value, the more small nodes will be filtered out.

Below, on the left-hand side, is an example with the default configurations (Filter out small nodes was not selected), while on the right-hand side is an example with the box selected and the threshold set to 0.1 (Figure 6).

Figure 6: Left: Default Configuration. Right: Threshold Set To 0.1.

Under the General Plot Options tab, we have typical configuration options including Title and Subtitle (Figure 7).

Figure 7: Sunburst Chart General Plot Options.

Some items of note here include the Display legend and Display tooltip, the Enable donut hole, and Enable inner label choices.

The inner label information is what displays once you’ve reached the end of a path. In this case, we want to see the waterfall height, so selecting sum for the Inner label style will work here. Additionally, we can add some clarifying text to the inner label (Figure 8).

Figure 8: Example Of The Inner Label.

If we choose to output an image, we can set the image size here.

The Control Options tab includes configuration of the interactive elements that appear when we are in the interactive or composite view of the node.

With all of these checked, we enable changing many of the sunburst plot configuration options from within the interactive view itself (Figure 9).

Figure 9: Sunburst Chart Node Control Options Tab.

When we execute the node and view the output, we can explore an interactive sunburst plot. The coloring has been automatically set by the node.

As we move from inside out, the tooltip helps to guide us and when we arrive at the waterfall of interest we see the path listed at the top of the visual and the height in meters displayed in the center of the sunburst! (Figure 10).

Figure 10: A Sunburst Chart.

A Splash Of CSS:

A lot of CSS classes can be configured with the sunburst chart and some basic examples are included in the workflow. See code block below.

/* example style rule */
.knime-title {
font-size: 35px;
font-weight: bold;
color: green;
fill: green;
}
.knime-subtitle{
font-size: 15px;
font-weight: bold;
fill: purple;
}
/* Controls Inner Label */
.knime-label{
font-weight: bold;
font-size: 20;
fill: purple;
}
/* Controls Legend Text */
.knime-legend-label, text.knime-legend-label {
color: orange;
fill: orange;
font-size: 14px;
font-weight: bold;
}
/* Controls Tooltip Text */
.knime-tooltip-caption, .knime-tooltip-key, .knime-tooltip-value {
color: purple;
fill: purple;
font-size: 14px;
}

As always, check this website for some guidance around configuring CSS classes in the KNIME JavaScript view nodes.

Notes On The Thin Red Line

KNIME’s JavaScript-enabled visualizations are not hurting when it comes to variable control options. As mentioned earlier, it’s possible to control title and subtitles with flow variables. Most configuration elements are controllable and some key flow variables are shown below (Figure 11).

Figure 11: Sunburst Node Flow Variables Of Note.

What About Color?

A discussion of applying color is unfortunately outside the scope of this article, but please see this workflow for a great example of applying custom colors in the sunburst chart.

Step 2: Parallel Coordinates Plot

As mentioned earlier, the parallel coordinates plot allows us to easily visualize many metrics all at once to quickly identify patterns or trends in an interactive analyst centric way.

In keeping with the waterfall motif, we are looking at Wikipedia data on the largest existing waterfalls by flow rate (Figure 12).

Figure 12: Workflow Step 2.

After a couple of data cleanup steps, we assign each waterfall it’s own color so it will be easier to identify between them on the output plot (Figure 13).

Figure 13: Color Manager Node Configuration

Next, let’s explore the configuration of the Parallel Coordinates Plot node.

Again, many of these options should look familiar (Figure 14).

Figure 14: Parallel Coordinates Plot Node Options Tab.

In the Options tab, we decide if we want to generate an image and limit the maximum number of rows.

Next, we choose our data from the twinlist. Keep in mind that columns may need to be re-ordered prior to ensure desired output in the plot. I chose to include Countries because I want to see how not only the metrics look for the waterfalls but what country these waterfalls belong to.

There are many approaches to analyzing data in a parallel coordinates plot, another example being class analysis in the classic iris dataset here.

If we have already colored values in our dataset, we can select Use colors from spec here. Otherwise, we can choose our nominal value column from the Color Column dropdown list and it will color the data for us.

In the General Plot Options tab, we see title and subtitle as our first configurable items (Figure 15).

Figure 15: Parallel Coordinates Plot General Plot Options Tab.

As with the sunburst chart, we can set an image size in pixels (if outputting to a file) and indicate if the interactive plot will scale to our open window size.

Based on the needs of our project or personal preference, we could change the color of the background (outside of the chart) and also that of the data area itself. Caution here, if nominal values are already getting colored adding additional color may be too much and detract from the usability of the chart (Figure 16).

Figure 16: Background Color Options.

Next, we have a choice around how we want to handle missing values in the dataset. Ideally, this should be resolved prior to visualization, but having the option here is nice. Let’s select Skip missing values.

Finally, we can choose the thickness of a straight or curved line and display a legend.

The Control Options and Selection and Filter tabs allow us to make decisions around how the chart behaves in the interactive and composite view.

This allows for in-view editing of chart elements, axes swapping, and more.

Click OK and execute the node (F7) and view the output (Figure 17).

Figure 17: Parallel Coordinates Plot.

We can easily see a comparison of all waterfalls in our dataset by flow rate, drop and width and what country they can be found in.

Immediately we see that most of these waterfalls fall within the same flow rate, drop and width ranges with only a few significant outliers. We can also see that the bulk of these are listed in Canada and the United States.

I mentioned previously that the parallel coordinates plot is a great tool for analysts, and a way to elevate this plot is to wrap it in a component with a Table View node.

The output here allows us to quickly analyze attributes and explore our data in a dashboard-like experience (Figure 18).

Figure 18: Interactive View With Table And Parallel Coordinates Plot.

CSS Reflow

Included in the workflow is some CSS to change the titles, x axis text and data columns text.

/* example style rule */
.knime-title {
font-size: 35px;
font-weight: bold;
color: green;
fill: green;
}
.knime-subtitle{
font-size: 15px;
font-weight: bold;
fill: purple;
}
/* Changes Top X Axis Text */
.knime-label, .knime-axis-label, .knime-tick-label, .knime-label text, text.knime-axis-label, text.knime-tick-label {
color: orange;
fill: orange;
font-size: 12px;
}
/*Changes Data Column Text */
.knime-label, .knime-tick-label, .knime-label text, text.knime-tick-label {
color: purple;
fill: purple;
font-size: 12px;
font-weight: bold;
}

The Thin Red Line: Variable Notes

Finally, here are some key variables of note (and their outputs) that might be beneficial to control in the Parallel Coordinates Plot node (Figure 19).

Figure 19: Parallel Coordinates Plot Flow Variables Of Note.

Conclusion

We covered a lot in this blog and in some respects it’s just a small piece of what is possible.

We explored how we can generate deep, rich data analysis and data-driven products with the Sunburst Chart and Parallel Coordinates plot nodes. We covered data considerations around missing values, and again discussed the importance of sound color use in our data visualizations. CSS class editing and flow variables were also touched on and the Table View node was paired with the Parallel Coordinates plot to enable a better analysis experience.

Whether on their own or a part of a larger dashboard or report, these are key data visualizations that can help stakeholders and analysts alike.

--

--

John Denham
Low Code for Data Science

I am a Data Scientist who is passionate about empowering people to make the most of their data. I run the website KNIME.tips.