GETTING STARTED | ETL | KNIME ANALYTICS PLATFORM

KNIME — Column Expressions

Roberto Cadili
Low Code for Data Science
4 min readJun 18, 2021

--

By Willem Lottering

As first published in LinkedIn Pulse

Column Expressions node.

KNIME Analytics Platform is a powerful tool for data scientists, but it is just as powerful for data engineers that needs to create processes for ETL, ELT and data blending requirements. I’ve recently worked on a workflow where one of the requirements were to edit data based on specific specifications. Sounds simple, but the trick was that the workflow would be repeated for different clients with different databases and with each having their own requirements.

Some of the things that needed to be done during the Transformation process are to round values in some columns, set values in columns based on values in other columns, just to name a few.

One way to solve this is to create a workflow that would split the logic depending on who the client is, which database it is and then create the logic. The challenges I found with this approach are:

· If a client’s requirements change you must find the logic for that specific client and database and update the workflow.

· You could, if you have a KNIME Server, create sub workflows with the logic for each client/database combination and call the sub workflow from the main workflow.

· Use the “Column Expressions” node with a bit of configuration stored in a file.

I decided to try the last option. Using a “Column Expressions” node and a config file for storing the column expressions configuration settings, I was able to extract logic out of the KNIME workflow and thus was able to change the requirements outside of KNIME without having to update the workflow. This makes it easier to test your logic on a test environment and simply deploying the config file to a production environment quicker without having to physically change the workflow.

I’m not going to go into explaining how the “Column Expressions” node work or how flow variables work, as that would make the article too technical and long.

Here is a cut down version of what I have done. The workflow only processes for one file but should be good enough to explain the idea.

The task of this workflow is to:

1. Load a data file

2. Load the configuration settings

3. Loop through the settings

a. Each setting will be applied and stored in a temp file

4. After the loop, the modified data will be read from the test file

I’ve configured three different requirements that needs to be applied to the data.

· Check if the “Rep” column contains a specific text “Jones” in this example, if so calculate a 10% discount on the unit price column and store the result in a new column.

· Check if the “Rep” column contains a specific text “Jones” in this example, if so calculate a 10% discount on the total column and store the result in a new column.

· Check if the “Rep” column contains a specific text “Jones” in this example, if so, update the column by adding “ — Updated” to the text value.

Since the “Column Expressions” node uses JavaScript, the configuration file contains the JavaScript needed, column name to be added or replaced and a couple of other values needed for the node.

The KNIME workflow also contains some logic that would change the data type that is specified in the config file to the data type format that the node needs. Again, I’m not going into too much details here, but in short an example to convert the type double to something the node requires is: “org.knime.core.data.def.DoubleCell”.

The data file looks like this:

Data overview.

The config file data looks like this:

Configuration file data.

The first row contains the settings to create the new column called “Unit Cost Discount” as type double and if the “Rep” is “Jones” will calculate a discounted value, else the “Unit Cost” value will be stored.

The second row will also add a new column, called “Total Discounted” and calculate the discounted total value for the rep “Jones”.

The last row will add the text “ — Updated” at the end of the rep name if the “Rep” is “Jones”.

And the KNIME workflow looks like this:

Workflow overview in KNIME.

The workflow loads the data and save it into a temp file. Then the configuration file is loaded. Each of the configuration settings will be looped, each loop will load the temp file, make the changes based on the configuration setting, save the temp file and then loop again until all the configuration settings have been executed. At the end, the temp file will be read and the data will be ready for further execution.

The end result will look like this:

Result overview.

My example workflow can be found on the KNIME Hub by following this link: https://kni.me/s/p2kBIEHYogMHsr8l

Download the whole folder containing the workflow and the “Data” folder using KNIME to be able to run the workflow.

--

--

Roberto Cadili
Low Code for Data Science

Data scientist at KNIME, NLP enthusiast, and history lover. Editor for Low Code for Data Science.