GETTING STARTED | COLORING TABLES | KNIME ANALYTICS PLATFORM

Let There Be Color!

Using Colors In Ways That Count With IMDb Sentiment Data and KNIME Analytics Platform

John Denham
Low Code for Data Science

--

Photo by Sharon McCutcheon on Unsplash.

Introduction: Let The Color Begin

We have datasets from IMDb on movie review sentiment and title length and we just need a quick and easy way to help our audience intuit our presented data. A touch of color in the rows might work. How do we do it?

KNIME Analytics Platform has a myriad of ways to visually present our data. Whether through the many JavaScript powered visualization nodes or through libraries leveraged inside of Python and R nodes we are not hurting for great options.

With the Color Manager node we can assign colors to nominal and integer data. These colors then reside in the RowID column of our data table and when paired with a composite view provide an effective way for our audience to connect the dots and intuit the data.

Intent To Color

In this post, we will describe and apply the Color Manager node to:

  • Auto apply colors to binned data in the IMDb Title Basics dataset.
  • Apply custom colors to nominal data in the IMDb Reviews Sentiment dataset.
  • Display tables with colored rows in a composite view.
  • Provide some tips on using color.

The first dataset is the title.basics dataset from IMDb. I cleaned up and filtered this dataset to slim it down and make it easier to use in the example workflow. The second dataset we are using is the Large Movie Review dataset from ai.stanford.edu. I blended the raw positive and negative reviews and applied a neutral label to a random 15% of the data to add another nominal element for illustration purposes. The datasets come with the reference workflow for this blog post, named Colored_Rows, and can be downloaded for free from the KNIME Hub.

The Color Manager Node

Figure 1: The Color Manager node.

The node we will be using today is the Color Manager (Figure 1). This is a KNIME Base node and should be easy to find after a quick search of the Node Repository.

Specifically, the Color Manager node is organized under the Views category (Figure 2).

Figure 2: Organized Under The Views Category.

With the Color Manager node:

“Colors can be assigned for either nominal (possible values have to be available) or numeric columns (with lower and upper bounds) …. The values are then computed during execution. If a column attribute is selected, the color can be changed with the color chooser.”

Step 1: Automatic Coloring

Figure 3: The Workflow in Step 1.

For our first example, we’re using the IMDb Title Basics dataset that I’ve pre-processed to make it easier to use in this example. I filtered the dataset to movies, startYear between 2018 and 2021 and I removed rows that were missing runtime or genre information. Below is a snapshot of the dataset (Figure 4).

Figure 4: IMDb Dataset Snapshot.

We’re interested in binning the minutes in the runtimeMinutes column to see per year, what bins, or ranges had the most movies in them. To keep things simple for this example we just want 4 bins.

Using the Auto-Binner node we put the runtimeMinutes column into the Include (green side) and set the binning method as a fixed number (4) based on equal frequency. Rather than getting just bin names, let’s select Borders so we have an idea what the range of included minutes looks like rather than Bin 1, Bin 2 etc. (Figure 5).

Figure 5: Auto-Binner Configuration.

After executing the Auto-Binner node, our output includes the respective bins for each runtimeMinutes value in the dataset.

Note. The parenthesis and brackets matter in binning notation. The “(“ means that the value is not included and a “]” means that the value is included as seen in Figure 6 below.

Figure 6: Auto-Binner Output With Bins.

You would read the following as:

[3,75] =

“3 is included and 75 is included in this range.”

Or

“This range is from 3 to 75, both included”

(89, 101] =

“89 is not included and 101 is included in this range.”

Or

“This range is from 90 to 101, where only 101 in included”

This will help when we examine the output ranges from the Auto-Binner node. Next we apply a GroupBy node to the dataset grouping startYear and runtimeMinutes [Binned] and aggregating by a count of genres. This outputs the summary table we were looking for. Our data is grouped by year and bin, and we have a count for the number of movies in each bin (Figure 7).

Figure 7: Grouped Output With Count Per Bin.

To make this stand out in our composite and table view let’s use the Color Manager node to set colors for each bin.

In the configuration dialog of the Color Manager node we simply select the column with values that we want to color. In our case, the bin names represent nominal data so they appear on the nominal (left) side of the configuration (Figure 8).

There are 3 pre-set color palettes available with 12 colors in Sets 1 and 2 and 7 in Set 3, the colorblind safe set. If your nominal values exceed the length of the set, colors will be duplicated.

For this example, let’s use Set 3. We’ll see that our nominal values (our bins) are automatically assigned colors from the Set 3 color palette (Figure 8).

Figure 8: Color Manager Configuration Window.

Note. There are limits here with nominal values though. In testing, the Color Manager node stopped working when I had over 61 unique labels. If we want to color by nominal values and our unique labels exceed the maximum amount, the node will not work and present an error message.

Click OK and execute the node (F7).

Right click the node and select “Table With Colors” (Figure 9).

Figure 9: Select Table With Colors.

The output table rows are colored based on the bin label. This makes it easy to see bin assignments at a glance (Figure 10).

Figure 10: Example Of Table With Colored Rows.

Next with a Sorter node we can sort by the count(genres) column in descending order and output this to a Table View node.

If we were working with numerical data and wanted to generate a heat map style range of color with the Color Manager node we would simply select the desired numerical column in the configuration window.

In the example below, we selected the runtimeMinutes column and the node automatically set the color from red at the minimum value of 3 to blue at the maximum value of 43200 (Figure 11).

Figure 11: Auto Set Range Min and Max Colors.

The output table includes colors from the range (Figure 12).

Figure 12: Colored Range With Impact Of Outliers

As you might notice, the outliers here are impacting the color granularity of our range. If we were using the Color Manager node on a range of values for production purposes (i.e. to an outside audience), it would be prudent to evaluate the impact of these outliers on the overall effectiveness of the color.

Step 2: Custom Coloring

Figure 13: The Workflow in Step 2.

In our IMDb Reviews dataset we have a column Review that contains the full text of the review and a column Sentiment that contains the sentiment label for each review. As mentioned previously, I selected a random 15% of the reviews and labeled them as Neutral. In your own sentiment analysis projects you may have the Neutral label, so that’s why I added it here.

In the previous step, we let the Color Manager node auto color our nominals and range. Since we are now dealing with sentiment data, we should apply our own color scheme to create consistency and connectivity across our sentiment analysis products.

As before, we simply read in the IMDb Reviews dataset. We are interested in the count of reviews by sentiment and percent of the total that each sentiment represents.

To get this summary, we use a GroupBy node. We group the Sentiment column and manually aggregate the Review column as a Count and a Percent (Figure 14).

Figure 14: Grouping Sentiment Data.

Now we’re ready to set some custom sentiment colors in the Color Manager node. These are HEX colors I regularly use in my sentiment analysis projects and the colors we will use today (Figure 15).

Figure 15: Sentiment Colors.

When we open the Color Manager node we see the nominal sentiment values on the left side of the configuration. Rather than use the automatic colors from a pre-loaded palette, let’s configure the node to use our chosen sentiment colors from above.

To do so, we click the nominal value we would like to color (in this case Positive), and under the RBG tab, enter our Hex code in the Color Code text box (Figure 16).

Figure 16: Setting A Custom Nominal Color.

Depending on your preferred color space there are options beyond the Palettes and Swatches. You can read more about color spaces here and here. Additionally, if we want to control the Alpha (transparency) channel of all colors at the same time, the Alpha tab allows us to do that.

After we color our sentiment labels click OK and execute the node (F7). The output now includes our custom sentiment colors (Figure 17).

Figure 17: Sentiment Data With Custom Row Colors.

From here we can leave the table as it is and move it into a Table View node to output as part of a composite view.

There is a final option I want to highlight. Using the RowID node configured to create a new RowID column based on the Sentiment column (Figure 18).

Figure 18: RowID Node Configuration.

By doing this, we get a table that includes our sentiment labels with their colors in the RowID column (Figure 19).

Figure 19: Sentiment Label and Color Together In RowID Column.

Other Color Considerations

Figure 20: The Color Manager Node.

Aside from its top data outport the Color Manager node includes a Color Settings port (the bottom blue port). This port can send color data to the Color Appender and Extract Color nodes (Figure 21).

Figure 21: Nodes With Color Data Ports.

Some data visualization nodes (such as the Bar Chart node) will accept the color data from the Color Manager node through its top data output port. A great breakdown of this can be found in the article How to Assign Colors to Bars in a Bar Chart — Three Shades of Green.

The Composite View

To wrap up, once we have rows in our tables colored to our liking we can connect to Table View nodes and the Pie/Donut Chart node.

Note. Ensure that Use Row Colors is checked under General Plot Options. In the Pie/Donut Chart node configuration.

If we wrap these into a component we get an interactable composite view upon execution.

In the composite view the Table View and the Pie/Donut Chart visuals will impact each other. If we select the checkbox next to Positive in the top table view, the Positive segment of the Donut Chart below it is highlighted. This works the other way as well. When we select the Negative segment of the donut chart, the corresponding row in the table above it is checked (Figure 22).

Figure 22: Composite View Interactions.

See the companion workflow to explore further how I configured these specific nodes.

As we add additional elements, such as tables, text and more to these composite views, we essentially build a custom dashboard that generates within our workflow and allows us to explore and understand our data even better.

Notes On The Thin Red Line: Understanding The Variables Of Color Manager

As in most KNIME nodes we have the ability to control configuration items through leveraging the variables that control them. The Color Manger has controllable variables that output data as seen below from Step 2 (Figure 23).

Figure 23: Variable Output From Color Manager Node.

While you might find some value in setting custom variables, the colors themselves are stored in decimal base 10 and converted to your selected color.

For example, the Positive color coming out of the node as -11493841 converts to the Hex value of FF509E2F. You could theoretically control the variables for the columns you selected, but it still requires some up-front configuration and converting your color values to decimal base 10 so they process correctly through the node.

If that is a route you want to explore, this website may help.

Conclusions: Final Notes

In this article we explored multiple ways to color table rows in KNIME. We looked at automatic coloring of nominal and integer data and custom coloring of nominal data. We explored some ways to summarize the IMDb Title Basics data, including auto-binning by movie length in minutes. We also looked at some basic ways to summarize the IMDb Reviews data (by count and percent of total).

We explored the idea of the composite view to generate a clean, interactive dashboard-like experience from the workflow and saw some ways to shift RowID data to present better in this view.

The Color Manager is an incredible node that, if used wisely, can elevate data analysis and reporting. Remember, the use of color in design (dashboards, websites, apps etc.) should not be arbitrary and if possible done with a focus on maximizing accessibility. Learn more here.

--

--

John Denham
Low Code for Data Science

I am a Data Scientist who is passionate about empowering people to make the most of their data. I run the website KNIME.tips.