GETTING STARTED | MACHINE LEARNING | KNIME ANALYTICS PLATFORM
KNIME — Machine Learning and Artificial Intelligence — A Collection
TL;DR: Where to find resources and workflow examples for various machine-learning tasks you can do with KNIME (and some Python)
KNIME offers various methods for machine learning and the use of artificial intelligence. I try to collect an overview with links to workflows you can implement to get you started (some by me, others by the community). Obviously with the topic of ML there are a lot of other resources out there. If you look for more code oriented content I would recommend: https://machinelearningmastery.com/ — and you can always implement Python code with KNIME :-) — and R/RStudio.
Also I would encourage you to take some courses on knime.learnupon.com to expand your knowledge also about Machine Learning. This article is not meant to replace a solid understanding of Data Science — more like to have resources ready when you know what to do. Some links how to learn KNIME and Data Science in general at the end of the text.
If you want to start learning KNIME in a systematic way I have these article:
“Learn KNIME — a short Collection”
Most resources linked have further links to follow. I also would like to point you to my collection on the KNIME Hub to several ML discussions on the KNIME Forum.
In this article I venture into some more ‘philosophical’ aspects about how and why ML and AI projects may fail or succeed. And what this might have to do with the traditional German engineering culture of ‘Spaltmaß’:
Data Preparation
Preparing your data obviously is very important and the whole of KNIME’s nodes and functions can be employed (like replacing Missing Values).
An overview about the classic methods can be found here as video and on the KNIME Hub:
- Data Preprocessing for Machine Learning Models — Part I
- Data Preprocessing for Machine Learning Models — Part II
In addition I have presented one quick way to speed up your preparation with the use of vtreat to automate the task. Please explore the relevant article and the Space on the KNIME hub:
- “Data preparation for Machine Learning with KNIME and the Python “vtreat” package” (https://medium.com/p/efcaf58fa783)
- KNIME Space on the Hub for data preparation (https://hub.knime.com/-/spaces/-/latest/~gWtVGr0pE-sAlmjC/)
Another way to go is dimension reduction with this KNIME Component and article:
Another topic with Machine Learning is the handling of categorical (string) data. You can read more abot it here:
You can also have a Do-It-Yourself Target Encoding using H2 or SQLite and storing the rules to apply with SQL code.
Classic Machine Learning
For a general idea about machine-learning there is the KNIME Learning Hub and self-paced courses like:
- L1-DS KNIME Analytics Platform for Data Scientists — Resources: Basics Hub Space
- L2-DS KNIME Analytics Platform for Data Scientists — Resources: Advanced Hub Space
Beginners Space “Machine Learning” in KNIME.
Binary Classification
The most common of all ML questions. A true/false or 0/1 setup. A thing will happen or it will not. The early history of Kaggle seems to be nearly completely devoted to that. I present you some collections of workflows that have worked well over the years to give you a quick start. It will not absolve you from thinking about you data and target.
- H2O.ai AutoML in KNIME for classification problems —Examples on the Hub
- Hyperparameter optimization for LightGBM — wrapped in KNIME nodes (https://medium.com/p/ddb7ae1d7e2)
- KNIME, XGBoost and Optuna for Hyper Parameter Optimization (https://medium.com/p/dcf0efdc8ddf)
- “Sparkling Predictions and Encoded Labels” — Developing and Deploying Predictive Models on a Big Data Cluster with KNIME, Spark and H2O.ai (YouTube Video auf Deutsch, Charts in English)
If you deal with highly imbalanced data (you only have very few targets) you might want to consult one of these (short version: maybe use AUCPR as your metric):
Multiclass Predictions
Not that common and more complex to handle but you can try to make predictions about multiple classes at once:
- LogLoss as Measure for multi-class tasks being discussed (thread 1 and thread 2)
- Score UCI Wine Quality Dataset — multiple Targets (multiclass)
Regression Predictions
Directly predict a numeric outcome.
- H2O.ai AutoML in KNIME for regression problems, Example on the KNIME Hub
- Score Kaggle House Prices: Advanced Regression Techniques — prepare data with vtreat — use H2O.ai nodes and other models — measure results with RMSE
… and since we are at it I also drop the link to my Github Pages here (think Jupyter Notebooks). Maybe more on this in another article:
- Collection of Jupyter notebooks to solve CLASSIFICATION tasks
- Collection of Jupyter notebooks to solve REGRESSION tasks
Parameter Optimization
Already in the binary section there were links to two examples where there is a optimization of (hyper-)parameters when building a model. A lot of this can now be automated — as long as you are careful not to overfit you model.
But you still might want to have a few links about this with regards to KNIME:
As well as some more resources. Making use of two prominent packages to optimize such parameters:
- BINARY: use KNIME / Python and LightGBM to build a model — Hyperparameter tuning with BayesSearchCV and Optuna — also preparing data with vtreat
- KNIME has a verified component called “Parameter Optimization (Table)” that you can see in various examples on the KNIME Hub for Classification and Regression
Evaluate your Models
Measures of Machine Learning (how to interpret your model). Also included in the above examples obviously.
Feature Importance in ML Models (what features are driving your model)
Auto-Machine Learning
KNIME also offers a take on automated machine-learning which is doing quite well and will give you an initial idea where you stand with your data. It will not absolve you from thinking about your task and preparing the data.
KNIME has an integration of H2O.ai AutoML as well as its own components to automate such tasks:
- “AutoML Regression and Classification Examples”
- “Guided Automation” (also see the additional links at this workflow)
Clustering
Unsupervised detection of patterns and groups
Rule Induction
You can also create a list of rules that one can still read and interpret to try and solve problems. They will need data preparation and some computing power.
- Weka and Yacaree rules
- KNIME Hub: Rule Induction with Weka Rule Nodes and Yacaree Associator
- “Rule Induction with Weka M5Rules (with numeric target)”
Shopping Basket / Next Best Product
A variation is an analysis where you try to find the next best product to offer to a customer
Deep Learning
Deep learning (used to be) all the hype and is still going strong to solve problems. You might want to familiarize yourself with the specifics. KNIME can also help you with that.
Before you dive right into it with KNIME and Keras and Tensorflow there are some challenges setting that up I have tried to address.
- A really quick way to get an idea if Deep Learning is for your problem would be to employ H2O.ai AutoML node and just set it to train various DL models
- Codeless Deep Learning with KNIME: Build, train, and deploy various deep neural network architectures using KNIME Analytics Platform
- VIDEO — A Friendly Introduction to Codeless Deep Learning
https://www.youtube.com/watch?v=onfU-YWDpgk - KNIME: A Friendly Introduction to [Deep] Neural Networks
- Codeless Deep Learning for Sequential Data
https://www.youtube.com/watch?v=VlMZ8YKKabM
Time Series (and some Deep Learning)
Time Series are a special case and some problems might best be formulated as time series. There is of course Facebook Prophet but also KNIME has some solutions to offer:
- “Codeless Time Series Analysis with KNIME”
A practical guide to implementing forecasting models for time series analysis applications - Time Series Analysis Workshop
https://youtu.be/3rRQIbDChvM
=> Video by the authors - Time Series Analysis with KNIME — an introduction
- Building a Time Series Analysis Application
- Tutorial: Introduction to Time Series Analysis
https://kni.me/w/wGDabhYz-46QgfZF - Multivariate Time Series Analysis with an RNN — Training
https://kni.me/w/B45XEOAuWeQBzO9b
Text Analysis
Some links to typical tasks involving texts. There is more with KNIME.
- Text Mining Use Cases plus Deep-Dive into Techniques
- Extract Tables from PDF files with the help of KNIME and R “tabulizer”
- More extracting tables from PDF with Tika Parser and
- Python Package “Camelot” for extracting tables from PDF
- KNIME Hub: Document Classification: Model Training and Deployment
String Similarity
Often you want to match strings or items by their similarity. KNIME can also help you with that.
- address deduplication, string similarity and fingerprinting (a collection)
- Similarity without a ground truth (automatically group texts)
- String Deduplication without Ground Truth — KNIME Forum (75366)
Automatic Topic Detection
This should go into a special article but I adapted a sample workflow to use severals modules to automate the detection of Topics from texts. This was right before the LLM hypes took off
Explainable Models and AI
In the end you might also want to know what drove your model to its decision. KNIME also has some advanced methods there — we already had the feature importance. These examples take it up a notch:
- Learn XAI based on Latest KNIME Verified XAI Components (https://www.knime.com/blog/learn-xai)
- How Banks Can Use Explainable AI in Credit Scoring (https://www.knime.com/blog/banks-use-xai-transparent-credit-scoring)
- Debug and Inspect your Black Box Model with XAI View (https://www.knime.com/blog/explainable-ai-black-box-model-xai-view)
- Learning (and teaching) Explainable AI (XAI) with KNIME — Video — (https://www.youtube.com/watch?v=M8wQhAylY_w)
- Explain Stroke Prediction Models with LIME in KNIME (https://forum.knime.com/t/explain-stroke-prediction-models-with-lime-in-knime/48777?u=mlauber71)
Large Language Models
In case you wonder where are the LLMs. There is a lot going on with KNIME and these new AI models. They merit their own article, two of which I add here for completion. There are more links to further resources there:
Books to look into to learn more …
Collection of Videos guiding you to the use of KNIME Version 4 and 5 in general (and some machine learning)
You might want to also have some links to instruction videos in general. I came across them while researching this article. So I present them here (also):
KNIME Version 5
- L1 — Data Literacy with KNIME Analytics Platform: Basics (KNIME Version 5.x)
- L2-DE Data Engineering with KNIME Analytics Platform (KNIME Version 5.x)
KNIME Version 4
- L1 for KNIME Version 4
- L2 for KNIME Version 4 (also covering the basics of Machine-Learning)
In case you enjoyed this story you can follow me on Medium (https://medium.com/@mlxl) or on the KNIME Hub (https://hub.knime.com/mlauber71) or KNIME Forum (https://forum.knime.com/u/mlauber71/summary).