GETTING STARTED | LOOPING | KNIME ANALYTICS PLATFORM

Dynamically Output to Multiple Files in KNIME

Looping is all you need!

Bob Peers
Low Code for Data Science

--

As first published on Creative Data

The Problem

How to output to multiple files with dynamic contents and unique names using KNIME without hardcoding filters and using multiple branches? Loops, you need loops.

Coming from Alteryx that doesn’t have loops (you are required to use macros to achieve the same effect), it took me a while to understand KNIME loops. But once mastered, they give you lots of flexibility and are much easier to understand and debug than macros since they follow the normal design rules and are visible at all times on the canvas.

An Example Dataset

Say you have a dataset as shown below and you want to output the data to separate files with one Collection No_ per file. Additionally, the rows in this dataset change, so next time there will be new values in Collection No_ but you still want the export to work.

Dataset to process by Collection No_.

How can you do this using KNIME?

The Loop Solution

The solution is to use the Group Loop Start node where we create a loop that will run once per Collection No_ group. This means that for each run of the loop we will only see records for one Collection No_ inside the loop. On the next iteration, we’ll see the next Collection No_ and so on until the full dataset is processed.

In the loop we do the following steps:

  1. Use Group Loop Start set on Collection No_.
  2. Get a single row with Collection No_ using a GroupBy node (to be used for the output file name).
  3. Create a dynamic string to use as the destination path for the file using the Collection No_ and String Manipulation node.
  4. Convert the string to a Path using String to Path.
  5. Make the Path column into a variable using Table Column to Variable.
  6. Feed the loop data into a CSV Writer node and use the Path variable to change the filename dynamically.
  7. Connect the Loop End to the CSV Writer and Group Loop Start.

The final workflow looks like this:

KNIME workflow with dynamic output.

In plain English the workflow does this at each iteration:

  • Uses the Group Loop Start node to iterate over only one group of records at once.
  • Extracts the grouping item to be used as the file name. This isn’t necessary if you don’t need the grouped data to be used as part of the filename as you could create a dynamic timestamp instead.
  • Write the records to a .csv file using the dynamic name fed in via the flow variable.

You could of course add much more data manipulation between the loop start and end, but this gives you the basic template for dynamically outputting files. Easy, uh?

--

--

Bob Peers
Low Code for Data Science

Builder. Automate all the things. Generally curious person.