GETTING STARTED | COMPONENTS | KNIME ANALYTICS PLATFORM

Community Components – Winter 2022 Collection

Sharing and reusing bundled functionality with everybody

Paolo Tamagnini
Low Code for Data Science

--

Photo by Hannah Busing on Unsplash.

Note that these components are built by and for the community. They have not been officially verified by the KNIME team.

Components — for sharing and reusing bundled functionality — were introduced in 2019. Since then, the community has been very active, learning to build, generalize, and encapsulate workflows as components and make them available to others by sharing them on the KNIME Hub. To date, over 300 components have been uploaded by the community! With the KNIME open source user base growing steadily, more and more components are going to be shared: by the community, for the community.

We are impressed by what the community has built. That’s why today we want to feature twelve we like as useful or fun community contributions. Some of them are built by well-known KNIME users we’ve highlighted before in our Contributors of the Month program. Others are new to building components. Give them a warm welcome!

Log into your KNIME Hub account and try out the community components — drag and drop them from the KNIME Hub to your Workflow Editor to try out and adapt them to your own requirements. Show your appreciation for a community component by dropping a like!

KNIME open source community components
Fig. 1. Drag and drop a component from the KNIME Hub to your Workflow Editor and show appreciation by dropping a like.

1. Translator by Armin Ghassemi Rudd

Armin (armingrudd) created a component that automatically queries the Google Translate web service with the Webpage Retriever node. Even if this approach is not exactly the one Google offers via API authentication, it can still be used to automate manual work. Time to translate your documents with a simple drag, drop, connect, set up and execute! Drop a like for the Translator component or its example workflow on KNIME Hub.

KNIME open source community components
Fig. 2. The Translator component automatically queries the Google Translate web service with the Webpage Retriever node.

2. Drivetime and Distance Query by Tosin Adekanye

Tosin (sntrada) made the TomTom API available via a component that automatically gets trips distances and durations given a start and end point in GPS coordinates and a selected vehicle. Provide your TomTom API key and provide geo locations with latitude and longitude values. Read more on the Drivetime and Distance Query component page or on its example workflow. Thank you Tosin!

KNIME open source community components
Fig. 3. The Drivetime and Distance Query component fetches trip distances and durations automatically from the TomTom API.

3. Row Count by Brian Bates

Brian (takbb) encapsulated a workflow into a component that creates a flow variable with a custom name showing the number of rows in the input table. This is slightly different from the Extract Table Dimension node, but given the many downloads it is still quite popular. Find the Row Count component and others like it on Brian’s KNIME Hub space!

KNIME open source community components
Fig. 4. The Row Count component creates a flow variable with a custom name, showing the number of rows in the input table.

4. Autofeat Generator and Apply by Ashok Harnal

Ashok (ashokharnal) bundled Python scripts into two components to run the ‘ autofeat ’ package. When you drag and drop the component and execute, Ashok makes sure that your KNIME automatically installs all the required Python packages via the Conda Environment Propagation node. The component takes your data and generates new columns based on the ones you provided. This automated feature engineering comes in handy when blindly optimizing performance of simpler machine learning algorithms such as logistic or linear regressions. The Autofeat Generator component also outputs a Python Object to apply the same transformations to new data via the Autofeat Apply component. Check out the example workflow on the KNIME Hub to view in detail how both components work.

KNIME open source community components
Fig. 5. The Autofeat Generator and Autofeat Apply components ensure that all the required Python packages are automatically installed with the Conda Environment Propagation node.

5. Microsoft Graph API by John Denham

John (johndenham//TardisPilot //knimetips), besides creating many KNIME accounts, has also built many components to query the Microsoft Graph API. This API grants you access to Microsoft Cloud service resources. John created six components that work together to manage the Azure Active Directory (AD) groups, membership and roster, allowing you to manage Azure user infos from a single KNIME Workflow. Questions? Reach out to John on the KNIME Forum by starting a discussion at the bottom of the KNIME Hub workflow page!

KNIME open source community components
Fig. 6. The Microsoft Graph API components are a set of components that work together to manage Azure user information from a single KNIME workflow.

6. PCA with R by Francesco Tuscolano

In order to perform PCA analysis, Francesco (francescots) has embedded an R script via the KNIME Interactive R Statistics Integration. KNIME already offers native nodes for PCA, as well as Spark PCA and H2O PCA integrations. We like this component because it shows how you can use the Conda Environment Propagation node for R dependencies, and not just Python! The first time the Principal Components Analysis with R component executes on your system it automatically installs in a Conda environment R and the necessary libraries: ‘psych’ and ‘GPArotation’ — necessary for the analysis to work, as well as the ‘rserve’ — necessary for KNIME Integration with R to work. After the first execution, the component skips any installation and directly executes the R script. Isn’t this neat? Take a look at the example workflow.

KNIME open source community components
Fig. 7. The Principal Components Analysis with R component shows you how you can use the Conda Environment Propagation node for R dependencies and not just Python.

7. Global Thresholdings by Laurent Thomas

Laurent (l.thomas) released a component to process images via up to 15 different global thresholding techniques. Global thresholding is useful in general to determine a threshold to distinguish between foreground and background of an image. Inside Laurent’s component the Global Thresholder node is looped each time with different settings based on component settings. The component can be used for example to compare the different techniques at its output and judge which is the best one for a particular set of images. Drop a like on the component KNIME Hub page and download the example workflow.

KNIME open source community components
Fig. 8. The Global Thresholder (multi) component processes images via up to 15 different global thresholding techniques.

8. Variance Inflation Factor (VIF) by Andrea De Mauro

Andrea (AdM) encapsulated a component to detect correlation between columns, more precisely multicollinearity, in order to discard the redundant information before training a model. The component Variance Inflation Factor (VIF) simply returns the VIF values, while the component Variance Inflation Factor (VIF) Filter automatically filters out those columns based on a custom threshold. Any feedback for Andrea? Reach out to him on the KNIME Forum thread he created!

KNIME open source community components
Fig. 9. The Variance Inflation Factor (VIF) components detect multicollinearity between columns to discard redundant information before training a model.

9. Standardized Coefficients by Daniele Tonini

Daniele (Jaqen79) offers the community a component to compute the Standardized Coefficients of any Linear Regression model. First train a model with the Linear Regression Learner node, then connect to Daniele’s component and input the training data and the raw coefficients (screenshot below). Finally, in the component configuration, select the numerical target, and execute. The Standardized Coefficients component returns new values for each input feature that measure the impact in % to the model output and sum up to 1. Isn’t that much more intuitive than the raw coefficient values from the learner output port? Try the component using its example workflow on KNIME Hub.

KNIME open source community components
Fig. 10. The Standardized Coefficients component returns new values for each input feature, measuring impact in % to the model output.

10. Cell Segmentation by Jan Eglinger

Jan (imagejan) is proposing a component for segmentation of cell bodies, membranes and nuclei from microscopy images. Put simply, the component takes as input pictures of cells and automatically divides their pixels into different areas. The Cell Segmentation with Cellpose component loads the Python package ‘cellpose’ via a Conda Environment Propagation node and executes it to perform image segmentation: pre-trained neural networks are loaded and applied to your microscopy images given a few parameters. Below you can see an animation of the Cell Segmentation with Neural Networks example workflow on the KNIME Hub. Before trying out the workflow make sure you manually install the KNIME Image Processing — Python Extensions!

12 Components Built By and For the Open Source KNIME Community
Fig. 11. The Cell Segmentation with Cellpose component segments cell bodies, membranes, and nuclei from microscopy images.

11. Google Analytics Query by Gavin Attard

Gavin (Gavin_Attard) enhanced the Google Analytics Query node for marketing analytics use cases by encapsulating it into a component. The Google Analytics Query component adds a retry on fail, a pagination feature, and improves sampling accuracy. The three enhancements have been included by adding nested loop nodes such as Recursive Loop Start and Table Row To Variable Loop Start as well as Try and Catch nodes. Impressive! Find more information about the component in an article by Gavin and look up his example workflow.

KNIME open source community components
Fig. 12. The Google Analytics Query component adds a retry on fail, a pagination feature, and improves sampling accuracy to the Google Query node.

12. Factor Analysis of Mixed Data (FAMD) by Fabien Couprie

Fabien (Fabien_Couprie) implemented a statistical method called Factor Analysis of Mixed Data (FAMD) for automated feature engineering for both numerical and categorical columns (that is why it’s called “mixed”). The FAMD component encodes the categorical columns via one-hot encoding, appropriately normalizes both numerical and one-hot encoded categorical, and then applies the PCA Compute node. Finally the component outputs the PCA post-processed results with new coordinates values for both input columns and rows. Any questions? Contact Fabien on the KNIME Forum !

KNIME open source community components
Fig. 13. The FAMD component, for automated feature engineering, encodes both categorical and numerical columns into vectors of coordinates.

A Swarm of New Community Components On KNIME Hub

We’ve selected a range of components to show different tools and use cases that have been spontaneously built by the KNIME community. This is of course only a small selection — there are many more community components for you to explore on the KNIME Hub.

We also highlight community components in the Community Component Highlights section of the KNIME Verified Components web page. Note that this type of highlight is more formal as these components have undergone a special review based on specific criteria: performance, impact, and stats (number of downloads and likes).

What are you waiting for now? Start building and sharing your components on the KNIME Hub!

More resources on the blog:

— — — — -

As previously published on the KNIME Blog: https://www.knime.com/blog/open-source-community-components

--

--

Paolo Tamagnini
Low Code for Data Science

Data scientist at KNIME specialized in guided analytics applications