GETTING STARTED | R/RSTUDIO | KNIME ANALYTICS PLATFORM

KNIME and R — installation across operating systems — some remarks

TL;DR: KNIME and R/RStudio are great working together, a few hints like employing Conda might make their cooperation even better

Markus Lauber
Low Code for Data Science
11 min readFeb 16, 2023

--

KNIME loves R

Just show me the KNIME & R workflows:

There is an official Guide to KNIME and R installation and you should read it. It will cover most issues and if everything works you are fine and can stop reading.

This article would like to add a few explanations and remarks and deal with some quirks that might (or might not) arise along the way.

People sometimes confuse the various R integrations in KNIME. There are basically 3 of them:

  1. The official KNIME/R integration which is what you should use most of the time! Unless there is a specific reason not to.
  2. An R community integration which will require Rserve to be run in the back on its own (we come to that package in a moment).
  3. Windows Binaries of R. If you do not want to deal with all the installations and just use some basic R — it only works with Windows as the name suggests. You would not be able to update this one — so maybe stay away …

Please do not get confused about these integrations and if in doubt check or just install the official one called: “KNIME Interactive R Statistics Integration”.

KNIME Interactive R Statistics Integration
“KNIME Interactive R Statistics Integration”.

Also you might want to install an IDE like RStudio to help you control your R installation (and maybe do other stuff with R).

Install R and Rserve

The most important thing besides installing R is to make sure KNIME and R can ‘talk’ to each other. The Rserve package makes sure that is possible. You definitely will have to install this package — there were some challenges with that in the past as you can see in the KNIME forum. And it has to be a version 1.8–6 or later. There is no way around this!

If you haven’t already succeeded in installing the package with just calling these lines within R or RStudio …

install.packages("Rserve")
install.packages("Cairo")
# https://forum.knime.com/t/r-snippet-error/21577/13?u=mlauber71

… you can try the following commands and install available binaries.

Remember. This is only necessary if the initial installation was not successful:

# for Windows - if simple installation is not succesful
install.packages("Rserve", repos = "https://cran.r-project.org", type="win.binary")

# for MacOSX Intel as well as Apple Silicon (M1, M2) - if simple installation is not succesful
install.packages("Rserve", repos = "https://cran.r-project.org", type="mac.binary")

# for MacOSX you can tell the program which package and specific version to use
install.packages("https://cran.r-project.org/bin/macosx/contrib/4.2/Rserve_1.8-11.tgz", repos = NULL, type ="mac.binary")
# you obviously might have to change the version if newer ones are available

On Windows you can also use installr and give the Rserve package as a zip file if the other methods do not work:

# install if necessary
# install.packages("installr")

# for Windows - if simple and 'binary' installation is not succesful
library(installr)
install.packages.zip("https://cran.r-project.org/bin/windows/contrib/4.2/Rserve_1.8-11.zip")
# you might have to update the version and file obviously if they evolve
# make sure to check your main R version like 4.2, 4.3

Also on Windows it makes sense to install Rtools so you can compile some packages where this might be necessary. It might take some back and forth to install all this. The most important thing to remember for Rtools is to keep the paths used clean and short!

On a MacOSX system the installation of R initially might provide some challenges (https://cran.r-project.org/bin/macosx/). Important to remember: you also might have to install some tools like XQuartz, Xcode and others (follow the instructions on the site). The initial R installation might be as simple as:

Also there is a script being provided by KNIME to help with the installation of the necessary packages if the installation of R, Rserve and Cairo do not work initially. Please check out the official guide about this:

Telling KNIME where to find R

Once you succeeded in installing R, you will then have to tell KNIME where to find it in its preferences. Once you press apply and no error message shows up, you are good to go.

KNIME preferences showing the path to the R installation
KNIME preferences showing the path to the R installation.

You can now, for example, check if the installation did work and maybe get some information about your R and KNIME environment with this workflow:

KNIME workflow to check version of R packages in KNIME
Check version of R packages in KNIME (https://hub.knime.com/-/spaces/-/latest/~annQfxz-Mrtn2BNm/).

If you want to explore more workflows to see what you can do with KNIME and R you can go to the KNIME Community Hub:

When you are comfortable with your own R/RStudio setup, you will be just fine and can skip the environment propagation stuff (which is cool, though).

Using the Conda Environment Propagation for R

You can also employ the Conda packages to not only install Python but also install a version of R with ‘Python’ and thereby store the configuration with all your R packages for easier use and deployment (and sharing with others).

If you want to read more about how to handle KNIME and Conda make sure to check my other Medium article: KNIME and Python — Setting up and managing Conda environments.

Here is a typical YAML file to set up R with Conda. I used this in another KNIME workflow where you can check basic facts about your R and packages from within KNIME (also Flow Variables and graphics):
https://hub.knime.com/-/spaces/-/latest/~119aEydIH0oCdht8/

# KNIME and Python — Setting up and managing Conda environments
# General guide for both Windows and Apple Silicon users
# Refer to the following resources for additional setup:
# KNIME Python Integration Guide:
# https://docs.knime.com/latest/python_installation_guide/index.html
# Blog post on setting up Conda environments with KNIME:
# https://medium.com/p/2ac217792539

# Creating or updating the environment:
# Windows: conda env create -f="path\to\knime_r_environment.yml"
# MacOS: conda env create -f="/path/to/knime_r_environment_apple_silicon.yml"
# Remove environment: conda env remove --name knime_r_environment
# Activate environment: conda activate knime_r_environment
# Update environment: conda update -n knime_r_environment --all

# Note: On MacOS with Apple Silicon, manual installation of RServe may be required:
# Start R from: /opt/homebrew/Caskroom/miniforge/base/envs/knime_r_environment/lib/R/
# Run in R: install.packages("Rserve", repos = "https://cran.r-project.org", type="mac.binary")

# Example: Update the base Conda environment to the latest version:
# conda update -n base -c conda-forge conda
# This command updates Conda itself and all packages in the base environment.

# Example: Update the knime_r_environment using this YAML file:
# For MacOS (Apple Silicon):
# conda env update --name knime_r_environment --file /path/to/knime_r_environment_apple_silicon.yml --prune
# For Windows:
# conda env update --name knime_r_environment --file path\to\knime_r_environment.yml --prune

# Explanation:
# --name: Specifies the name of the environment to update (knime_r_environment).
# --file: Specifies the YAML file containing the environment configuration.
# --prune: Removes dependencies that are no longer required by the environment as specified in the YAML file.

# Example: Update the knime_r_environment to the latest versions of all packages:
# conda update -n knime_r_environment --all
# This command updates all packages in the knime_r_environment to their latest versions.

name: knime_r_environment # Name of the environment
channels: # Repositories to search for packages
- conda-forge
- knime # Includes KNIME-specific packages

dependencies: # List of packages to be installed
# Python and KNIME dependencies
- python # Version =3.9
- knime-python-base # Core KNIME Python integration
# - knime-python-scripting # Uncomment if additional Python scripting features are required

# Visualization and plotting
- cairo # SVG support
- pillow # Image handling
- matplotlib # Plotting library
- IPython # Interactive computing
- nbformat # Notebook support
- scipy # Scientific computing
- jpype1
- jupyter # Jupyter Notebook support

# Core R packages and integration
- r-base # Version >=4.1.3, core R package
- r-rserve # Version >=1.8_7, RServe for R-KNIME communication
- r-essentials # Essential R packages, including dplyr, ggplot2, etc.

# Visualization in R
- r-cairo # High-quality plot generation
- r-ggplot2 # Grammar of graphics plotting system

# Session management and utilities
- r-sessioninfo # Information about the R session
- r-foreign # Read and write data files in various formats
- r-readr # Read rectangular data, like CSV files
- r-readxl # Read Excel files
- r-readods # Read and write ODS files
- r-arrow # Integration with Apache Arrow

# Additional R packages for data manipulation and analysis
- r-dplyr # Data manipulation
- r-fbasics # Financial engineering functions
- r-mlmetrics # Metrics for evaluating ML models
- r-stringr # String operations in R
# - r-broman # Miscellaneous R functions (uncomment if needed)

# Reporting and interactive graphics
- r-knitr # Dynamic report generation
- r-plotly # Interactive plotting
- r-lattice # Data visualization
- r-pastecs # Time series and descriptive statistics
# - r-rsqlite # SQLite database integration (uncomment if needed)

# Machine Learning in R
# - r-mlr3verse # MLR3 ecosystem for ML (uncomment if needed)
- r-xgboost # Gradient Boosting
- r-h2o # Scalable ML platform
# - r-gbm # Generalized Boosted Regression Models (uncomment if needed)
- r-caret # Streamlining ML model creation
- r-pls # Partial Least Squares and Principal Component Regression
- r-randomforest # Random forest for classification and regression
- r-mgcv # Mixed GAM computation
- r-nlme # Nonlinear mixed-effects models
# - r-pmml # Predictive Model Markup Language (uncomment if needed)

# Support Vector Machines (SVM)
# - r-e1071 # SVM and clustering (uncomment if needed)

# Lasso and Ridge Regressions
- r-glmnet # Regularized regression models
- r-foreach # Parallel computing support
- r-lars # Least Angle Regression

# Dimensionality reduction
- r-rtsne # T-SNE for large datasets
- r-tsne # T-distributed Stochastic Neighbor Embedding

# Additional Python dependencies
- pip
- pip:
# - vtreat # Data preparation for ML (uncomment if needed)

You can adapt the list and update your environment like described in the article.

Only packages that are hosted on the conda-forge channel (or another Python channel) can be installed this way. They would start with “r-” and then their original R name. You can check if the package is there for your environment like this:

conda search conda-forge::r-cairo

You would tell the R Snippet (or other R nodes) under the “Advanced” settings to use the conda environment with the R installation you have created:

configure KNIME R Snippet with conda environment
Configure KNIME R Snippet with conda environment (https://medium.com/p/2ac217792539).

For your convenience I have created a KNIME Component that would activate a basic conda environment with a current R version and useful packages that determines your operating system (macOS / Apple Silicon or Windows, sorry Linux) and activate the correct version of Conda Environment Propagation (I use these nodes on my machines):

Some additional commands that might be useful

Sometimes you might encounter additional problems when installing R and RStudio — you would then go on to ask Google or maybe ChatGPT how to proceed. I have added some common commands that might help you (if everything does work you can skip this).

You might have to set a Proxy server if you are behind a firewall. You can do this permanently or just within the session:

Sys.setenv(http_proxy  = "http://proxy.mycompany.com:8080/")
Sys.setenv(https_proxy = "https://proxy.mycompany.com:8080/")

Two additional R packages that would come in handy to find your Rtools (on Windows) once you have installed them:

install.packages("pkgbuild")  # https://r-lib.github.io/pkgbuild/

pkgbuild::rtools_path()
pkgbuild::find_rtools()

You can access the .Renviron file (yes with a dot in front) through this and on Windows add the path to the Rtools directory:

install.packages("usethis")   # https://usethis.r-lib.org/

# https://forum.knime.com/t/integrating-r-with-knime/17291/6?u=mlauber71
usethis::edit_r_environ()
# enter the path to your Rtools on Windows
# PATH="C:\Rtools\bin;${PATH}"

KNIME, R, Conda Environment Propagation and Apple Silicon (M1, M2 chips)

If you have set up your R environment via Python on an Apple Silicon machine (with the M1 and M2 chips), you might experience that it is not initially possible to install the Rserve package (this might change over time of course). In this case, you will have to find the path of your Python installation and your py39_knime_r environment.

If you followed the instruction from the article you can get information about the path from your Terminal (shell):

brew info miniforge

# /opt/homebrew/Caskroom/miniforge/base

The resulting path might look something like this:

/opt/homebrew/Caskroom/miniforge/base

The R version would then sit in a sub-folder where your environment is (“py39_knime_r”). You can proceed to start the basic R and get to the prompt:

/opt/homebrew/Caskroom/miniforge/base/envs/py39_knime_r/lib/R/

Once you started the basic R, you can then install the necessary Rserve package like this:

install.packages("Rserve", repos = "https://cran.r-project.org", type="mac.binary")

With q() you can again quit the R prompt. Now, your R should work also via the Conda Environment Propagation.

KNIME / R Community Nodes — starting Rserve

If you absolutely must (I once did it for classification and regression ML tasks), you can use the KNIME R Community nodes. In order for them to work, you will have to start the Rserve(r) in your R (or RStudio) before using it in KNIME:

# You must start Rserve outside of KNIME (in RStudio)
# this is necessary for the Community nodes of R
# https://forum.knime.com/t/r-source-community-node-connection-error/13087/2?u=mlauber71

library(Rserve)
library(RSclient)

Rserve(port = 6311, debug = FALSE, args = "--vanilla")

# start the Rserve(r)
rsc <- RSconnect(port = 6311)

# shutdown the server *after* you used it
RSshutdown(rsc)

Make sure you have the community extension installed (if you want to use them):

You also will have to configure the R-Scripting settings in the KNIME preferences — and the port should match the one you are using when you start the Rserve(r):

setting the preferences for the KNIME R Community Nodes
Configure the R Scripting Community Extension under KNIME 4

If you encounter problems you can try and ask for help on the KNIME Forum. And you can check out my collection of some older (and more obscure) debates and solutions about KNIME and R.

“A meta collection and article about R and KNIME”
(https://hub.knime.com/-/spaces/-/latest/~tj5tS_6gYvqOSPlk/)

Update R

If you want to keep your R clean and updated here are two ways to do it. Best to do it from RStudio or the R GUI itself:

3 Methods to update R on RStudio (for Windows & Mac)
https://www.linkedin.com/pulse/3-methods-update-r-rstudio-windows-mac-woratana-ngarmtrakulchol/

Updating R from R (on Windows) — using the {installr} package
https://www.r-statistics.com/2013/03/updating-r-from-r-on-windows-using-the-installr-package/

If you enjoyed this article you can follow me on Medium (https://medium.com/@mlxl) and on the KNIME forum (https://forum.knime.com/u/mlauber71/summary) and hub (https://hub.knime.com/mlauber71).

--

--

Markus Lauber
Low Code for Data Science

Senior Data Scientist working with KNIME, Python, R and Big Data Systems in the telco industry