GETTING STARTED | DATA WRANGLING | KNIME ANALYTICS PLATFORM

KNIME Snippets (2): Unearthing Hidden Node Gems — Managing Missing Values, Row Numbers and some Quick Java and Paths

“KNIME Snippets” is a series of short articles dwelling on some specific KNIME subjects. Follow me on Medium to get more.

Markus Lauber
Low Code for Data Science

--

In working with KNIME, I’ve found several useful features within its nodes that might not be immediately obvious. These can make your work much easier, especially when you start to combine them. In this article, I’ll introduce you to some of these helpful features that can also enhance automation in your tasks.

“A yellow robot on a sunny lake with a futuristic building in the background” — variations with KNIME Logo (Markus × DALL·E)

Missing values with individual fillings

The Missing Value node in KNIME, as the name suggests, handles missing values. However, it also has more functionalities that might not be apparent at first. One such feature can be found under “Column Settings”, where you can make individual settings for each column. Here, you’ll find a host of additional options, such as the ability to fill empty rows of a variable with the ones above. This feature is not only useful for managing missing values but can also be employed to organize and populate blocks of data, as we’ll explore later.

use the Missing Value node to fill in missing data rows
Use the Missing Value node to fill in missing data rows (https://forum.knime.com/t/replace-placeholder-with-values-from-above/38602/4?u=mlauber71).

The mighty “return” in Java

The ‘return’ function in Java can be very useful when working with KNIME. Although there are various methods to create new columns and variables, I have a particular fondness for the Quick Java nodes, such as Java Edit Variable (simple) and Java Snippet (simple). For instance, I’ve found these nodes handy for swiftly creating paths and other variables, especially before using the Create File/Folder Variables node.

return "knime://knime.workflow/data/";
Use the simple Java snippets to quickly create a variable
Use the simple Java snippets to quickly create a variable.

And of course there is more to KNIME and the world of Java:

RowID and Counter

Every piece of data in KNIME does have a RowID. Typically KNIME will handle the IDs and you do not have to deal with them But sometimes you want to take control and make sure the RowIDs are suitable.

  • you can for example have an existing column as RowID or store existing RowID in a column
  • you can use the Counter Generation node to create new IDs. But also to have another way of addressing the Rows by position and see their order
create a new RowID based on a new Counter column
Create a new RowID based on a new Counter column.

Another way to get a numeric RowID is to use the ROWINDEX with the Rule Engine node. This will then produce a Long numeric variable which can store larger numbers.

In the Rule Engine node the ROWINDEX variable will also give you a list of continuous numbers
In the Rule Engine node the ROWINDEX variable will also give you a list of continuous numbers (https://forum.knime.com/t/filtering-for-a-subset-of-data-from-a-single-table/40107/9?u=mlauber71).

One use of these nodes is to identify blocks in imported data (from Excel) and then handle them:

Here we will combine the filling of missing values with a Counter and a Rule Engine so to mark the parts of the data.

Mark blocks in an imported Excel file to deal with them separately
Mark blocks in an imported Excel file to deal with them separately (https://forum.knime.com/t/move-head-rows-to-a-column/45094/2?u=mlauber71).

Always know where you are

I already have praised the “KNIME File Handling Guide” which is lengthy but gives you all you need to know about how KNIME does handle files and paths. The two nodes to do this are “Extract Context Properties”. I have constructed a Component that always would create an absolute Path variable and create the /data/ folder beneath the current workflow. It will set the path separators / and \ automatically according to your operating system (path.separator.system):

This component gives you the /data/ folder and other useful absolute path variables
This component gives you the /data/ folder and other useful absolute path variables (https://hub.knime.com/-/spaces/-/latest/~mtGtXTjuECtf5YnI/).

Your own Java always ready

The “Extract System Properties” will give you an even wider array of context information, like user names and details about your installation and operating system which you can later also use as Flow Variables. Since KNIME brings its own Java version you can also use that with other software like H2O.ai under Python by specifying the JAVA_HOME variable which can be found in the setting “java.home” in KNIME:

Find and use KNIME’s own Java version and use it in other programs
Find and use KNIME’s own Java version and use it in other programs (https://hub.knime.com/-/spaces/-/latest/~SGv1Cosah8BXabfa/).

The use for H2O.ai in Python would then look like this:

# provide a software like h2o with a java path using KNIME's own java version
# variable java.home from KNIME
# https://hub.knime.com/-/spaces/-/latest/~SGv1Cosah8BXabfa/

import os;

# the paths may vary depending on your version and operating system
# os.environ["JAVA_HOME"] = "C:\\Users\\x123456789\\software\\knime_4.6.1\\plugins\\org.knime.binary.jre.win32.x86_64_17.0.3.20220621\\jre"
os.environ["JAVA_HOME"] = "/Applications/KNIME 4.6.0.app/Contents/Eclipse/plugins/org.knime.binary.jre.macosx.x86_64_17.0.5.20221116/jre/Contents/Home"

print("setenv JAVA_HOME", os.environ["JAVA_HOME"])

KNIME and Python — Setting up and managing Conda environments” and then use Python with machine-learning: ”KNIME, XGBoost and Optuna for Hyper Parameter Optimization” and “Hyperparameter optimization for LightGBM — wrapped in KNIME nodes”.

In case you enjoyed this story you can follow me on Medium (https://medium.com/@mlxl) or on the KNIME Hub (https://hub.knime.com/mlauber71) or KNIME Forum (https://forum.knime.com/u/mlauber71/summary).

--

--

Markus Lauber
Low Code for Data Science

Senior Data Scientist working with KNIME, Python, R and Big Data Systems in the telco industry