DATA STORIES | KNIME & PYTHON | KNIME ANALYTICS PLATFORM

KNIME L1 & L2 Experience from a Python Coder

A reflection of learning KNIME from a KNIME Beginner

Kevin Sun
Low Code for Data Science

--

Photo by Element5 Digital on Unsplash.

I am a Python programmer and am always open to exploring new methodologies to expand my technological skill sets and improve efficiency. Recently, I have encountered KNIME Analytics Platform, a free, open source no-code/low-code tool for data science, and had a great experience learning L1 & L2 self-paced courses in two weeks for free (you just need to login to enroll).

After passing the corresponding certification exams, I received an L1 badge and an L2 badge that I could share on LinkedIn. Here is my reflection on learning KNIME for anyone who wishes to start their journey towards codeless data analytics.

My Background

I’ll start with a little about myself. I hold a bachelor’s degree in Business Administration with a concentration in Information Systems, a blend of business and technology. After graduation, I worked as a data analyst. I am currently pursuing a master’s degree in data science and I primarily code in Python.

However, at the university, when we were looking for practicum opportunities, one of our options was KNIME, an end-to-end no-code/low-code analytics platform for creating data science solutions. This intrigued me because sometimes I need to build a report quickly and time for writing boilerplate code repeatedly is a luxury I don’t always have. After learning about KNIME, I was surprised to see how much it had to offer.

First Impression

My first impression of KNIME can be summarized into two words: accessible (yet) deep. KNIME provides a fast solution to modify and analyze data in no time which makes it accessible. Yet it can deliver professional outcomes with its rich functionalities for users at different levels.

For beginners in data science, KNIME doesn’t require users to have any knowledge of coding. Users can create a workflow by just dragging and dropping nodes and connecting them, yet the outcomes for whichever data blending challenges, ETL processes, visualizations, or dashboards look professional and the data flow is easy to understand.

For advanced users, KNIME offers powerful parameterization and workflow control options, error-handling capabilities, methods for REST APIs consumption, a wide range of ML models for classification and prediction, and much more. There are thousands of native nodes and example workflows, which allow users to build anything from the ground up. Being an open-source platform, users can share their workflows freely on the KNIME Community Hub — so you never really have to start from scratch.

L1 & L2 Courses

The first step to learning KNIME is to take the L1 and L2 courses. There are two paths: the Data Science path and the Data Wrangler path. The Data Science path emphasizes using machine learning models, so that was the path I chose. I finished L1 and L2 in two weeks, so they were not very time-consuming. The courses’ structure is well-organized. I was able to digest a large amount of information efficiently. For instance, I first studied the material and learned from the videos, took quizzes with 5 questions, and finished by doing hands-on exercises in KNIME. After this process, I understood the concepts and their implementations in KNIME. The courses are beginner friendly because all data science concepts are explained before teaching their implementations in KNIME.

KNIME L1 course includes content for basic proficiency, such as data access, data cleaning, data selection, visualization, intro to machine learning, etc. The structure of the L1 course follows the progress of classic data science projects, so I could create a KNIME workflow of a data science project after finishing L1. If you want to learn how to use KNIME and build simple workflows, the free L1 course offers sufficient knowledge.

Figure 1. A simple example workflow for a linear regression model.

KNIME L2 courses are for advanced proficiency, including Date Time processing, flow variables, loops, if-else statements, parameter optimization, etc. The concepts in L2 are built on top of the L1 course, which enriches users’ understanding of data sciences and implementations of complex workflows. If you are an expert in data science and want to be able to create a workflow for any use cases, I would recommend taking the L2 course as well.

Figure 2. An example of the String Configuration node passing string flow variables to the Rule Engine nodes.

KNIME Nodes and Python

In this section, I will show some simple examples for comparisons of KNIME nodes and Python. And then let’s explore how Python works within KNIME Analytics Platform.

Data access

The most popular method for reading data in Python is “pd.read_csv()” from the Pandas package. There are also “pd.read_excel()”, “pd.read_json()”, “pd.read_html()” if users want to read files with different extensions. These functions allow users to pass attributes and modify input data.

KNIME can read CSV data from the CSV Reader node. Users use the node’s Configuration dialog to specify file location, column/row delimiter, header, row ID, encoding, etc. In addition, users can change columns’ datatype or skip certain rows for advanced adjustments — all without any coding.

Figure 3. Configuration dialog of the CSV Reader node.

It works the same for files with different extensions in other data access nodes, such as Excel Reader, JSON Reader, File Reader, Image Reader, etc. Or for accessing data from different databases, be it Snowflake, SQLite, Oracle, Hive or MongoDB just to mention a few.

In fact, users can simply drag and drop the file to the KNIME workspace and KNIME Analytics Platform will automatically read the file with correct formats.

Figure 4. Drag and drop files to KNIME workflows.

Note. You can learn more about data blending from more than 50 sources in the free “Will They Blend?” book.

Date&Time

Every time I encounter the Date&Time data type, it gives me a small heart attack because I often forget how to manipulate them and have to google the syntax. In contrast, KNIME offers many nodes to process, manipulate, and extract Date&Time, which I could understand intuitively without too much memorization.

For instance, we can convert strings to Date&Time by using the String to Date&Time node. This node will generate Date&Time type values from the input string format. In the configuration dialog of the String to Date&Time node, we can select the format that matches a certain string from the drop-down menu or manually input the format. If we want to be lazy, we can click “Guess data type and format” and KNIME will automatically figure out the format for us.

Fig 5. Configuration dialog of the String to Date&Time node.

If the dataset already contains Date&Time, we can use the Extract Date&Time Fields node to extract the timestamp or a specific day, month, or day of the week.

Python Script

And while KNIME Analytics Platform can work without Python, sometimes you just want to or need to code. You can execute Python scripts using the nodes of the KNIME Python Scripting extension. With it, KNIME users can write their own script for personal use or for others.

For example, I used a Python Scrip node to write a Fibonacci function (Fig. 6). The output of this node could then be passed on to the next node or Python Script node to manipulate the incoming data table.

Figure 6. Configuration dialog with Python code in the Python Source node.

In addition, I can even create a simple Neural Network and pass it to the next node for modularization.

Conda Environment Propagation Node

Setting up virtual environments is necessary for program languages to run codes properly with required packages and versions. In general, we would create a “.yaml” file in Python to save the virtual environment for different projects to ensure each project can run properly and doesn’t affect the other. It is a lot simpler in KNIME Analytics Platform. We can use the Conda Environment Propagation node to select the required environment with the packages we need and propagate it to downstream nodes as a flow variable. For instance, I used PyTorch in the previous Python script example, which doesn’t come with the basic Python environment. I can use the Conda Environment Propagation node to select “torch” as a part of the virtual environment. Therefore, other users will be able to run my workflow properly without any issues.

Figure 7. Configuration dialog of Conda Environment Propagation node with selected torch package.

For Python developers, from KNIME Analytics Platform 4.6 on, KNIME released the KNIME Python Extension Development (Labs), which allows KNIME nodes to be written completely in Python. Python programmers can use KNIME with a pure-code approach to develop Python nodes and share them within teams or organizations keeping the same look and feel of any other KNIME node. You can learn how to set up and develop Python nodes on 4 Steps for your Python Team to Develop KNIME Nodes.

There are endless possibilities with KNIME. Coming up next, I want to build a reusable Graph Neural Network by using PyTorch. Stay tuned!

Takeaway

KNIME Analytics Platform provides an easy way to build workflows and create data science projects. It is extremely helpful for avoiding repetitive coding, automating processes, and creating reusable solutions. If you want to dive into data science or are looking for a different method for data analysis, I suggest taking KNIME L1 and L2 courses to try it out. If you ever need help with your own projects, you can check out the KNIME Forum or KNIME Community Hub for KNIME users to ask questions and share workflows, respectively.

Thanks for reading and let me know about your experience with KNIME!

--

--