GETTING STARTED | PYTHON & CONDA | KNIME ANALYTICS PLATFORM

KNIME and Python — Setting up and managing Conda environments

TL;DR: Install Python/Conda through Miniforge, find and use specific YAML-files to manage an environment, tell KNIME and use their Environment Propagation

Markus Lauber
Low Code for Data Science

--

KNIME loves Python (and vice versa)

KNIME and Python are powerful tools that are great working together. Setting up the low-code software KNIME to use Python might be a bit of a challenge sometimes. So this story is here to help you, guide you through the process and give you confidence in using the tools. The examples are all on the KNIME Community Hub so you can test what is being said yourself.

This guide is aimed at the moderately ambitious Python user (KNIME is supposed to be low-code after all) so if you are an expert and know what you are doing you might be able to help yourself or maybe detect an argument why I mostly stick with the approach described in this story. All this might cost you maybe a few hours to fully grasp it if you have little previous experience but it might greatly enhance your abilities to use advanced software — so let’s start :-)

Bundled Python Environment in KNIME

First — From v4.6 on KNIME has an integrated (“bundled”) Python environment that constantly gets updated with new packages — like OpenPyxl to edit Excel files. So if you just want to use basic Python functions or examples you are probably fine with just installing this extension.

Also you can choose to follow the official KNIME and Python documentation and your are encouraged to read it anyway. Also there also is a community Python extension for KNIME which you might not want to confuse with the official one (which is the one recommended).

Why combine Miniforge, Conda and Conda-Forge

If you want to use KNIME and Python properly I would suggest to first set up a Python environment through Miniforge and use the Conda package to manage your environments. There are lots of examples about Python and environments out there — I will show you what has best worked for me in the last years. Feel free to adapt.

Conda is a package manager for Python and Miniforge a minimal installer for it. Conda-forge is a curated collection of Python packages. If you read the licenses this combination should be free to use in most environments and might offer a fairly stable combination of packages and dependencies so you might opt to use that. KNIME currently uses Python packages from different repositories — I found that sometimes more difficult to manage so I stick with the basic combination. If you need special packages that are not properly listed with conda-forge you can additionally install them through PIP (but we come to that in a moment).

Python package dependencies and KNIME

If you use Python more often you know that managing packages and dependencies is a constant challenge. Python and KNIME together are no exception and to work with the special KNIME Python nodes (like the standard Python Script node) the platform sometimes requires specific package versions (not necessarily the very latest ones) that are defined in the settings and special YML/YAML files (we come to that). This will be even more important if you want to use deep learning software like Keras and TensorFlow together with KNIME (there will be another story about that).

Getting Conda up and running and introducing it to KNIME

First order of business is to get a basic Conda environment up and running and introducing KNIME to it. Do not worry about packages and dependencies just yet.

You should start by installing Miniforge. This story will be based on MacOSX examples and also keep in mind the Windows operating system (for Linux please adapt accordingly).

Just as a quick reminder: You will find your command prompt in MacOSX under the “Terminal” in your Launchpad …. in Windows it is the search and CMD window.

MacOSX (also Apple silicon)

Sometimes installing this under MacOSX can be a slight challenge and I cannot cover all that can happen but you might try and follow the instructions and maybe try a few times.

  1. XCode — Apple developer tools (https://developer.apple.com/xcode/)
    (and a working shell to accept commands).
  2. Homebrew (https://brew.sh) also the FAQ — https://docs.brew.sh/FAQ
    remember to “brew update” your installation.
  3. Miniforge (https://github.com/conda-forge/miniforge)
    “brew install miniforge”.

Windows

Download the installer from the Miniforge website and (well) install it.

Your first Conda prompt

After installing and starting the terminal your prompt might be looking something like this on your Mac (you can change the default). This is where you will start your journey into Conda and environment management. Don’t worry in the end it boils down to a few commands :-)

(base) my_user@MacBook ~ %

But let’s take it step by step.

KNIME has special settings and packages for Python in YAML

KNIME has codified the combination of Python packages it would install through its settings in a YAML file that also can serve as a basis for a decent Python environment. You will see that installing, using, changing, deleting and re-installing environments is a pretty common practice with Python. Depending on their tasks (and operating systems) people might have several dozens of environments — there is no shame in that. Just try to keep track. Conda environment management, YAML and KNIME will help you on the way.

KNIME loves YAML

So what are KNIME’s settings? They give a basic script in the Python guide and they include several YAML files within the installation. So these are the ones to start with.

I have created a KNIME workflow to locate and extract these YAML files from the depth of the installation (you can also manually find them on Windows and MacOSX) — or use the one provided below.

KNIME Workflow to find and extract Python YAML configuration files
KNIME Workflow to find and extract Python YAML configuration files (KNIME Community Hub: https://kni.me/w/SGv1Cosah8BXabfa).

The examples are in KNIME 4. I have a version of the YAML-Extraction Workflow in KNIME 5 (https://hub.knime.com/-/spaces/-/~wHFwjmZiPS9qzl_D/)

KNIME workflows are just folders on your machine that you can access via the explorer or finder and the YAML files will be located beneath the workflow in its /data/ folder:

YAML files in a /data/ subfolder extracted from the KNIME installation
YAML files in a /data/ subfolder extracted from the KNIME installation.

As you can see there are configurations for a lot of scenarios like ‘standard’ KNIME & Python operations or settings for deep learning and also different Python versions. The ‘official’ Python might be some versions (3.10+) ahead but in oder to keep compatibility (I assume) KNIME currently sticks to 3.9 as the latest version (that might move up the numbers in the future of course). I will go with the latest version and the standard settings initially. Newer versions might work but if in doubt revert to the standard.

So here is a py39_knime.yml file — edited with the new knime meta package for Python “knime-python-base”:

# file: py39_knime.yml with some modifications
# edit FEB-2023 - new meta package https://anaconda.org/knime
# THX Carsten Haubold (https://hub.knime.com/carstenhaubold) for this
name: py39_knime # Name of the created environment
channels: # Repositories to search for packages
# - defaults # edit: removed to just use conda-forge
# - anaconda # edit: removed to just use conda-forge
- conda-forge
- knime # conda search knime-python-base -c knime --info
dependencies: # List of packages that should be installed
- python=3.9 # Python
- knime-python-base # dependencies of KNIME - Python integration
# - knime-python-scripting # to also build Python packages for KNIME
- cairo # SVG support
- pillow # Image inputs/outputs
- matplotlib # Plotting
- IPython # Notebook support
- nbformat # Notebook support
- scipy # Notebook support
- jpype1 # A Python to Java bridge
- jupyter # Jupyter Notebook
- pip # Python installer
- pip: # install additional packages via pip
# - vtreat # additional packages that would to be available in conda-forge

As you can see there are several ‘restrictions’ entered like Python 3.9 or a minimum version of packages. This is the (dark) art of finding the right combination of modern and also stable packages. Commands used by KNIME Python nodes might change so they stick to a known (combination of) version. As of February 2023 KNIME has made our live easier by bringing together a lot of these settings into a meta package (https://anaconda.org/knime).

Also here I have removed two channels in order to stick to conda-forge and avoid compatibility and possible licensing problems. You might want to save your YAML file somewhere on your machine where you keep important configurations.

You can now proceed to create your first KNIME Python environment. Just to be sure the actual conda command is everything after the “%” :-)

(base) my_user@MacBook ~ % conda env create -f="/Users/knime/py39_knime.yml"

This is the same thing that KNIME would do if you would use the dialog in the preferences. But sometimes this does fail and to handle everything on your own will give you more control as we will see in a moment. When the installation is finished you will get an information that the new environment called “py39_knime” has been created. You are now ready to activate it — that will be necessary to use this specific one.

(base) my_user@MacBook ~ % conda activate py39_knime
(py39_knime) my_user@MacBook ~ %

You can see the prompt has changed. Now your environment is ready to be used and expanded. If you work in the Terminal always make sure to check which environment is activated or explicitly state the name you want to work with (more on that further down).

Last thing you might want to do is to make sure you know the location of your newly installed conda environment:

(py39_knime) my_user@MacBook ~ % conda info - all | grep -i python

sys.executable: /Users/my_user/opt/miniconda3/bin/python
conda location: /Users/my_user/opt/miniconda3/lib/python3.9/site-packages/conda
CONDA_PYTHON_EXE: /Users/my_user/opt/miniconda3/bin/python

You will be interested in just the miniconda3 path.

The conda settings in KNIME settings

Now it is time to make conda known to KNIME in the settings/preferences. This is a small but decisive step since it will offer you all kinds of options. You can now take advantage of all of KNIME’s support for Python but also continue to manage environments thru conda and terminals — as always with KNIME you do not need to choose — you can have it all :-)

You start with setting the path to your (general) Python installation like this:

Set basic conda path in KNIME’s preferences
Set basic conda path in KNIME’s preferences (no need to worry about environments here). Your exact path might be different

Once you have done this you can then set the specific py39_knime environment as standard for Python nodes:

Conda preferences for KNIME’s Python nodes.
Conda preferences for KNIME’s Python nodes.

This is now just a standard setting. You can also choose individual settings for each Python node (we come to that in a minute). Once everything is grey (that is green), you see the Python version number here and no error messages you are ready to use Python and KNIME nodes. As you might have noticed you can also opt to use the bundled version of Python here — so you do not have to worry about all the YAML stuff - but where is the fun in that :-)?

Basic managements of your Python conda environment

Once you have established your settings you might want to expand your Python environment to include more packages as you go along. One option is to just install packages via pip or conda and just use them. With PIP just make sure you have your environment py39_knime active! That mostly will work like telling conda to install xgboost from conda-forge into your new environment:

conda install -n py39_knime -c conda-forge xgboost

Also it does make sense to see if your environment is up to date once in a while and let conda make the decisions (you will be prompted and see what is planned):

conda update -n py39_knime --all

Hint. Sometimes conda itself might get confused about all the dependencies so you can try to update it first (“conda update conda”). Also sometimes it can make sense to run the update more than once. And if everything fails you can always delete an environment and start again (https://conda.io/projects/conda/en/latest/user-guide/tasks/manage-environments.html). Just accept it as a fact of life that opensource software involves more work from the user’s side.

Update your Python environment through shell commands

An advanced option to manage additional packages is to include new packages in your YAML file and tell conda to update your environment. By doing this you can keep track of what you install and also be able to reproduce that if you might want to give other people your preferred Python installation. This also might come in handy if you exchange Python environment settings between operating systems (Mac, Windows, ….). The basic YAML settings will (mostly) be the same — the details of packages used might differ between operating systems (we come to KNIME’s Conda Environment Propagation support in a moment). So a ‘clean’ YAML file is a real benefit. Once you get the hang of it you can easily use project specific settings but we are getting ahead of ourselves.

So you might want to edit your YAML file according to your needs by adding packages — and restrictions (please note this is a collection of packages that I have been using with KNIME lately — your collection might look much different):

# edit FEB-2023 - new meta package https://anaconda.org/knime
# THX Carsten Haubold (https://hub.knime.com/carstenhaubold) for this
name: py39_knime # Name of the created environment
channels: # Repositories to search for packages
# - defaults # edit: removed to just use conda-forge
# - anaconda # edit: removed to just use conda-forge
- conda-forge
- knime # conda search knime-python-base -c knime --info
dependencies: # List of packages that should be installed
- python=3.9 # Python
- knime-python-base # dependencies of KNIME - Python integration
# - knime-python-scripting # to also build Python packages for KNIME
- cairo # SVG support
- pillow # Image inputs/outputs
- matplotlib # Plotting
- IPython # Notebook support
- nbformat # Notebook support
- scipy # Notebook support
- jpype1 # A Python to Java bridge
- jupyter # Jupyter Notebook
- fastparquet # alternative parquet format
- openpyxl # Excel Editing support for Python
# conda install -c conda-forge liac-arff
- liac-arff # read and write ARFF files
- pandas-profiling # Data report for exploration
- arrow # advanced date and time formats
- sweetviz # Report overview over data
# Advanced Machine Learning
- xgboost
- lightgbm
- featuretools # feature generator
- miceforest # Missing value replacement
- nbformat # Notebook support
- scipy # Notebook support
- pip
- pip:
- missingno # Missing Value analysis
- autoxgb # auto XGB
- vtreat # automatic feature generation
- h2o>=3.38 # H2O.ai Machine Learning Platform

You can now tell conda to update your environment to update everything according to your wishes. That might take a moment.

conda env update -n py39_knime --file="/Users/knime/py39_knime.yml"

You might opt to tell conda to remove any packages that you might have removed from the YAML file by adding “ — — prune” to the command.

conda env update -n py39_knime --file="/Users/knime/py39_knime.yml" --prune

Well yes it might be that you will have to try a bit back and forth until you reach a stable setting. Maybe that is the way of a dynamic open source project or collection of projects.

Now after the update you for example will have the popular Jupyter notebook on your system and you can use it in your browser. I use notebooks to test some concepts that I later put into KNIME Python nodes. So I like to have that option in my packages.

conda activate py39_knime
jupyter notebook

You can now also check the version of a single Python package in your environment, which sometimes can come in quite handy.

(base) my_user@MacBook ~ % conda activate py39_knime
(py39_knime) my_user@MacBook ~ % conda list -n py39_knime -f h2o --json
[
{
"base_url": "https://conda.anaconda.org/pypi",
"build_number": 0,
"build_string": "pypi_0",
"channel": "pypi",
"dist_name": "h2o-3.38.0.3-pypi_0",
"name": "h2o",
"platform": "pypi",
"version": "3.38.0.1"
}
]

This would now for example give you the option to force the installation of the (as of now) latest version of the H2O package:

conda install -n py39_knime -c h2oai h2o=3.38.0.2

Note. Here I have used a special channel h2oai. Normally in this story you would use conda-forge (conda install -n py39_knime -c conda-forge pyarrow).

KNIME and Conda Environment Propagation node

So now that you have your own Conda Environment and are able to manage it via the shell you can package this with the help of KNIME’s Conda Environment Propagation node:

KNIME’s Conda Environment Propagation.

KNIME’s Conda Environment Propagation
KNIME’s Conda Environment Propagation node.

The first time you open the Conda Environment Propagation node it will ‘capture’ your environment with the (exact) packages and versions. Please note these will be specific to your operation system, that is why I sometimes choose to provide two (or more) such nodes. So you will have one Python environment for your MacOSX and Windows system. You still will be able to steer both with the same YAML file (mostly) but the propagation will be different.

In the Python Script node you now can set which environment to use (if you just have basic needs like numpy or pandas you can just stick to the bundled version). Each node might have its own Python environment.

Configure KNIME Python node with individual conda environment KNIME 4
Configure KNIME Python Script node with individual conda environment — KNIME 4 (https://hub.knime.com/-/spaces/-/~SGv1Cosah8BXabfa/current-state/)
Configure KNIME Python node with individual conda environment KNIME 5
Configure KNIME Python Script node with individual conda environment — KNIME 5 (https://hub.knime.com/-/spaces/-/~wHFwjmZiPS9qzl_D/current-state/)

Export your Environment into another YAML files

You can also export your existing conda environment into a (new) YAML file from the history. This would contain all packages including dependencies and therefor not be that nice to use and maintain. But at least you would have your packages:

conda env export --from-history | grep -v "prefix" > "/Users/knime/py39_knime_exported.yml"

(on Windows the syntax would use findstr)

conda env export --from-history | findstr -v "prefix" > py39_knime_exported.yaml

So now you have full control over your Python needs. The time spent to master this will greatly enhance your Python abilities. This also might come in handy if you do not want to combine KNIME and Python.

This story might get updated as new informations appear. The usual disclaimers apply.

Next up are Stories about KNIME, Python and Deep Learning and also managing your R installation (cf. this and this on the KNIME forum) with the help of conda.

— — —

Thanks to D. Gutmann for hints and support!

BTW 1: KNIME also does work well with Jupyter notebooks, individual Python modules (think .PY files and functions). You can also write complete nodes purely in Python if you want.

BTW 2: Yes there are alternatives to conda like Mamba. If you like them you can also use them of course.

BTW 3: Yes, KNIME is constantly working on the Python integration. Specifically date and time and other formats, the use of indexes and the like. So stay tuned and follow the forum. :-)

BTW 4: If you enjoyed this story you can also follow me in the KNIME forum and on the KNIME Community Hub with much more examples (https://hub.knime.com/mlauber71).

BTW 5: If you want to see what you can do with even the bundled KNIME and Python integration with regards to graphics you can watch this video and try the workflow (https://kni.me/w/nbfX818PlGRUflhK):

Advanced Visualisations with Python and KNIME also building a Data App

https://youtu.be/mG2SZiKG9zo?t=2140

Some material for the legal stuff. Ask your experts if in doubt.

“Conda is an open source package management system and environment management system ….” (https://conda.io/en/latest/index.html)

Sustaining our stewardship of the open-source data science community
(Peter Wang, CEO and co-founder of Anaconda, Inc.)

Handling Anaconda without getting Constricted (https://florianwilhelm.info/2021/09/Handling_Anaconda_without_getting_constricted/)

Package Distribution and the anaconda.com Terms of Service (https://conda-forge.org/blog/posts/2020-11-20-anaconda-tos/)

Anaconda is not free for commercial use (anymore) — alternatives?(reddit.com/r/Python/comments…b2x&context=3)

--

--

Markus Lauber
Low Code for Data Science

Senior Data Scientist working with KNIME, Python, R and Big Data Systems in the telco industry