GETTING STARTED | DATA WRANGLING | KNIME ANALYTICS PLATFORM
KNIME Snippets (2): Unearthing Hidden Node Gems — Managing Missing Values, Row Numbers and some Quick Java and Paths
“KNIME Snippets” is a series of short articles dwelling on some specific KNIME subjects. Follow me on Medium to get more.
In working with KNIME, I’ve found several useful features within its nodes that might not be immediately obvious. These can make your work much easier, especially when you start to combine them. In this article, I’ll introduce you to some of these helpful features that can also enhance automation in your tasks.
Missing values with individual fillings
The Missing Value node in KNIME, as the name suggests, handles missing values. However, it also has more functionalities that might not be apparent at first. One such feature can be found under “Column Settings”, where you can make individual settings for each column. Here, you’ll find a host of additional options, such as the ability to fill empty rows of a variable with the ones above. This feature is not only useful for managing missing values but can also be employed to organize and populate blocks of data, as we’ll explore later.
The mighty “return” in Java
The ‘return’ function in Java can be very useful when working with KNIME. Although there are various methods to create new columns and variables, I have a particular fondness for the Quick Java nodes, such as Java Edit Variable (simple) and Java Snippet (simple). For instance, I’ve found these nodes handy for swiftly creating paths and other variables, especially before using the Create File/Folder Variables node.
return "knime://knime.workflow/data/";
And of course there is more to KNIME and the world of Java:
- Java and null values and if, then else syntax
- Java if, then else with an additional variable
- Java and more null, NaN and if , then else
RowID and Counter
Every piece of data in KNIME does have a RowID. Typically KNIME will handle the IDs and you do not have to deal with them But sometimes you want to take control and make sure the RowIDs are suitable.
- you can for example have an existing column as RowID or store existing RowID in a column
- you can use the Counter Generation node to create new IDs. But also to have another way of addressing the Rows by position and see their order
Another way to get a numeric RowID is to use the ROWINDEX with the Rule Engine node. This will then produce a Long numeric variable which can store larger numbers.
One use of these nodes is to identify blocks in imported data (from Excel) and then handle them:
Here we will combine the filling of missing values with a Counter and a Rule Engine so to mark the parts of the data.
Always know where you are
I already have praised the “KNIME File Handling Guide” which is lengthy but gives you all you need to know about how KNIME does handle files and paths. The two nodes to do this are “Extract Context Properties”. I have constructed a Component that always would create an absolute Path variable and create the /data/ folder beneath the current workflow. It will set the path separators / and \ automatically according to your operating system (path.separator.system):
Your own Java always ready
The “Extract System Properties” will give you an even wider array of context information, like user names and details about your installation and operating system which you can later also use as Flow Variables. Since KNIME brings its own Java version you can also use that with other software like H2O.ai under Python by specifying the JAVA_HOME variable which can be found in the setting “java.home” in KNIME:
The use for H2O.ai in Python would then look like this:
# provide a software like h2o with a java path using KNIME's own java version
# variable java.home from KNIME
# https://hub.knime.com/-/spaces/-/latest/~SGv1Cosah8BXabfa/
import os;
# the paths may vary depending on your version and operating system
# os.environ["JAVA_HOME"] = "C:\\Users\\x123456789\\software\\knime_4.6.1\\plugins\\org.knime.binary.jre.win32.x86_64_17.0.3.20220621\\jre"
os.environ["JAVA_HOME"] = "/Applications/KNIME 4.6.0.app/Contents/Eclipse/plugins/org.knime.binary.jre.macosx.x86_64_17.0.5.20221116/jre/Contents/Home"
print("setenv JAVA_HOME", os.environ["JAVA_HOME"])
“KNIME and Python — Setting up and managing Conda environments” and then use Python with machine-learning: ”KNIME, XGBoost and Optuna for Hyper Parameter Optimization” and “Hyperparameter optimization for LightGBM — wrapped in KNIME nodes”.
If you want to read more KNIME Snippets — here is the collection:
In case you enjoyed this story you can follow me on Medium (https://medium.com/@mlxl) or on the KNIME Hub (https://hub.knime.com/mlauber71) or KNIME Forum (https://forum.knime.com/u/mlauber71/summary).