GETTING STARTED | AUTOMATION | KNIME ANALYTICS PLATFORM

KNIME, Paths and Loops — Automate Everything

The (few) nodes that will set you on the right path…

Markus Lauber
Low Code for Data Science

--

The KNIME Analytics Platform is just great for a lot of tasks, not least of them automating stuff on your system. Loops are essential for every programming approach.

I want to demonstrate with just a few Nodes and Variables combined how you can automate and streamline a lot of your tasks.

“A happy yellow robot juggling balls in the style of Anselm Kiefer” (DALL·E)
“A happy yellow robot juggling balls in the style of Anselm Kiefer” (DALL·E).

If you want the official version you can read the: “KNIME Flow Control Guide” — also this articles might help in “Understanding KNIME Loops” and to know “What are flow variables?

You will find all the examples show in this article in a Workflow you can download from the KNIME Community Hub:

One Loop to Rule them All - “Table Row to Variable”

KNIME has a lot of loops and they might all come in handy (one day) but for now I recommend to focus on just the “Table Row to Variable Loop Start” node.

You can turn every table of items into a loop - your (groups of) data, a list of files, the results of some calculations and so on. KNIME will give you good control and an overview about what is going on.

The “Table Row to Variable Loop Start” node might be all you need to start your automation
The “Table Row to Variable Loop Start” node might be all you need to start your automation (https://hub.knime.com/knime/extensions/org.knime.features.base/latest/org.knime.base.node.meta.looper.variable.start.LoopStartVariable3NodeFactory/).

Just two inputs — Your own data or a List of Files

You will basically have two things to feed your Loop with. A KNIME (data) table you have provided of the items you want to iterate over — or a list of (external) files that you have for example on your hard drive.

Often a Group By Node can give you the list of items you want to iterate over. I also like this approach since it gives you control and you can see what you will be doing.

The loop input is a list of Files that will be converted into Flow Variables. Or it might be the result of a Group By node or just a list you happen to have (created) — https://hub.knime.com/-/spaces/-/~3v-KS_WWBchl_vIz/.

List Files and extract the Information

The List Files/Folders node is useful in checking your environment (that is your files) outside KNIME. You can define filters and combine and cascade things like scanning for folders first and then for the files within them.

The result of the List Files node is used in the Loop Start
The result of the List Files node is used in the Loop Start — https://hub.knime.com/-/spaces/-/~3v-KS_WWBchl_vIz/.

You can always access meta information about your files and also employ them in your automation like using the latest file or sorting them by size (or excluding small/empty ones).

Be creative with your paths

Creating new names and paths (variables) for your files will be essential for your automation. Like adding a current time-stamp to the name, changing the extension or adding the number of the current iteration.

You can either enter the elements by hand or use Flow Variables to set them in the Create File/Folder Variables node:

  1. The Basic Folder Path where the file should be stored
  2. The name of the resulting Flow Variable that contains the File Name and Path (yes this also can be a dynamic name)
  3. The Name of the File/Folder (without extension). This is where you will often employ a Java Snippet before to edit a name based on the current iteration of your loop or similar
  4. The Extension of the File. This can be the same you extracted from the file you read. But if you want to for example turn .XLSX into .CSV you could change that here
Create a new File (Path) Variable
Create a new File (Path) Variable — https://hub.knime.com/-/spaces/-/~3v-KS_WWBchl_vIz/.

With the Path to URI and the URL to File Path nodes you can extract more information from a Path like the ‘pure’ name of the file and the extension along with the full path.

Extract more information from a Path with URI/URL
Extract more information from a Path with URI/URL.

If you want to iterate over some groups of data from your original file you can use the Flow Variables in a Row Filter to select a sub-group of values:

Use the Flow Variable from the Loop Start node to filter data
Use the Flow Variable from the Loop Start node to filter data for each iteration.

You also should choose your Loop End Point

Every loop will have its end point where there either is a collection of results (in a table) or the loop will just end after finishing the tasks. There are more options but let us start with these two.

Collect the Results — Loop End

The Loop result is a table you can then use further in KNIME
The Loop result is a table you can then use further in KNIME.

Just Finish the Job — Variable Loop End

The other alternative is to collect the Flow Variables — which you can then use again to do new things…

With the Variable Loop End node you can collect the Flow Variables used
With the Variable Loop End node you can collect the Flow Variables used.

Start the Loop with an Individual Touch — First one is different

Sometimes you want to do something special at the first iteration of your loop like resetting a counter or removing / creating a file or something else. You can do this with switch. With the additional benefit of getting another idea how to structure your work thru switches.

If it is the first iteration of the loop (0) then you switch on the top port in the CASE Switch (0) — https://hub.knime.com/-/spaces/-/~3v-KS_WWBchl_vIz/
// check the first iteration (0)
if($${IcurrentIteration}$$ == null) {
return 0;
}
else if ($${IcurrentIteration}$$ == 0) {return 0;}
else {return 1;}

I hope you enjoy starting to automate your tasks with the help of the KNIME Analytics Platform.

If you want to put your workflow on a server, share it with the team and work together you should check out the KNIME Business Hub.

Keep working until the Job is done

In another example you let a job run until it is finished (or at least try 10 times to do it):

In case you enjoyed this story you can follow me on Medium (https://medium.com/@mlxl) or on the KNIME Hub (https://hub.knime.com/mlauber71) or KNIME Forum (https://forum.knime.com/u/mlauber71/summary).

--

--

Markus Lauber
Low Code for Data Science

Senior Data Scientist working with KNIME, Python, R and Big Data Systems in the telco industry