GETTING STARTED | AUTOMATION | KNIME ANALYTICS PLATFORM
KNIME, Paths and Loops — Automate Everything
The (few) nodes that will set you on the right path…
The KNIME Analytics Platform is just great for a lot of tasks, not least of them automating stuff on your system. Loops are essential for every programming approach.
I want to demonstrate with just a few Nodes and Variables combined how you can automate and streamline a lot of your tasks.
If you want the official version you can read the: “KNIME Flow Control Guide” — also this articles might help in “Understanding KNIME Loops” and to know “What are flow variables?”
You will find all the examples show in this article in a Workflow you can download from the KNIME Community Hub:
One Loop to Rule them All - “Table Row to Variable”
KNIME has a lot of loops and they might all come in handy (one day) but for now I recommend to focus on just the “Table Row to Variable Loop Start” node.
You can turn every table of items into a loop - your (groups of) data, a list of files, the results of some calculations and so on. KNIME will give you good control and an overview about what is going on.
Just two inputs — Your own data or a List of Files
You will basically have two things to feed your Loop with. A KNIME (data) table you have provided of the items you want to iterate over — or a list of (external) files that you have for example on your hard drive.
Often a Group By Node can give you the list of items you want to iterate over. I also like this approach since it gives you control and you can see what you will be doing.
List Files and extract the Information
The List Files/Folders node is useful in checking your environment (that is your files) outside KNIME. You can define filters and combine and cascade things like scanning for folders first and then for the files within them.
You can always access meta information about your files and also employ them in your automation like using the latest file or sorting them by size (or excluding small/empty ones).
Be creative with your paths
Creating new names and paths (variables) for your files will be essential for your automation. Like adding a current time-stamp to the name, changing the extension or adding the number of the current iteration.
You can either enter the elements by hand or use Flow Variables to set them in the Create File/Folder Variables node:
- The Basic Folder Path where the file should be stored
- The name of the resulting Flow Variable that contains the File Name and Path (yes this also can be a dynamic name)
- The Name of the File/Folder (without extension). This is where you will often employ a Java Snippet before to edit a name based on the current iteration of your loop or similar
- The Extension of the File. This can be the same you extracted from the file you read. But if you want to for example turn .XLSX into .CSV you could change that here
With the Path to URI and the URL to File Path nodes you can extract more information from a Path like the ‘pure’ name of the file and the extension along with the full path.
If you want to iterate over some groups of data from your original file you can use the Flow Variables in a Row Filter to select a sub-group of values:
You also should choose your Loop End Point
Every loop will have its end point where there either is a collection of results (in a table) or the loop will just end after finishing the tasks. There are more options but let us start with these two.
Collect the Results — “Loop End”
Just Finish the Job — “Variable Loop End”
The other alternative is to collect the Flow Variables — which you can then use again to do new things…
Start the Loop with an Individual Touch — First one is different
Sometimes you want to do something special at the first iteration of your loop like resetting a counter or removing / creating a file or something else. You can do this with switch. With the additional benefit of getting another idea how to structure your work thru switches.
// check the first iteration (0)
if($${IcurrentIteration}$$ == null) {
return 0;
}
else if ($${IcurrentIteration}$$ == 0) {return 0;}
else {return 1;}
I hope you enjoy starting to automate your tasks with the help of the KNIME Analytics Platform.
If you want to put your workflow on a server, share it with the team and work together you should check out the KNIME Business Hub.
Keep working until the Job is done
In another example you let a job run until it is finished (or at least try 10 times to do it):
Hands-on Examples
You can check these examples to see more loops in action
- CSV files from subfolders to Excel Sheets — treat the first iteration differently
- Stop a loop if the column sum is greater than 50% of my flow variable (showing a conditional end to a loop)
- Excel — Loop over Sheets and turn them into individual Files
- Use Excel cell addresses to extract values and turn them into columns
- [PATH] Create Path Variables — explained
- [PATH] Extract Paths used in File Writers into Flow Variable
And more Workflows about Meta Information and file handling:
- Extract meta information from Excel file (with the help of Python)
- even more fun with URI, URL and File Paths — work with the new Path variable format to list and move/copy files
- This workflow demonstrates how to use the new Excel Reader and path variables and read Excel sheets
- Loop thru same Excel sheet names from various sources and collect them back
- This workflow demonstrates how to use the new Excel Reader and path variables and read Excel sheets and skip a step when a file is missing (record which files have been processed)
There is a collection of smaller articles to learn more about KNIME
In case you enjoyed this story you can follow me on Medium (https://medium.com/@mlxl) or on the KNIME Hub (https://hub.knime.com/mlauber71) or KNIME Forum (https://forum.knime.com/u/mlauber71/summary).