GETTING STARTED | TESTING | KNIME ANALYTICS PLATFORM
Testflows in KNIME Analytics Platform
Testing workflows and components in KNIME Analytics Platform
Introduction
Regardless of the tool or language being used, it’s generally easier to build new content than it is to maintain and modify existing content. The more complex and feature-rich our solutions are, the harder it becomes to make a change without unintended consequences for our end users.
So, how do we guarantee that all facets of a KNIME workflow, component, or workflow segment continue to work as expected after we’ve made a change? Automated tests!
In this article we’ll discuss two methods for building effective KNIME workflow tests (a.k.a. “testflows”): the KNIME Testing Framework and the Workflow Executor node. We’ll also discuss components and how they relate to automated testing and good development practices in KNIME.
What is a workflow test (or “testflow”)?
A testflow is a KNIME workflow that will provide controlled inputs to a component, workflow segment, or entire workflow and make assertions that the outputs align with our expectations. If the output deviates from expectations, then alerts will be raised.
As data scientists, we could think of our testflows as hypotheses that need to either be proven or disproven. Here are a few sample hypotheses (more commonly referred to as “test cases”):
- If an empty dataset is provided to my component, my component will return an error with a message indicating that an empty dataset was provided.
- If a dataset (which is missing required columns) is provided to my component, my component will return an error along with information regarding which field(s) are missing or invalid.
- If I provide a valid dataset to my component, I expect it to return ____.
The important thing to understand is that sometimes we want to test that things work (a.k.a. testing the “happy path”), and other times we want to ensure that things break in a way that we expect when invalid inputs are provided or other error conditions occur. This allows us to provide an intentional and intuitive experience for anyone using our content, and it also helps ensure that any future changes continue to pass all of the same tests.
The concept of testflows aligns very closely with the concept of automated testing in languages such as Python. I wrote a separate article on the subject of unit testing in Python which is available here for anyone curious.
Why create testflows?
The purpose of a test workflow is to detect issues with workflows and components as early as possible. This could be during the development process, or well after a workflow has been deployed to production.
Issues could mean…
- A workflow or component fails to execute for unexpected reasons
- A workflow or component produces wrong output
- A workflow or component doesn’t show the expected error messages
Reasons for these issues could include…
- Invalid inputs
- Invalid workflow configuration
- Changes to node inputs/outputs during an update
- Changes in external services (e.g. REST API integrations)
- Changes to a nested component
Helpful nodes for testing the output of a node or component
Before diving into the two primary methods for creating testflows in KNIME, let’s look at a few nodes that will come in handy for both techniques.
These nodes allow us to validate the structure of data in various formats and also determine if two datasets are identical. This is useful for determining if the outputs of our components, workflows, and workflow segments align with expectations.
There are plenty of other nodes useful for creating testflows — these are just a few of the most commonly used ones!
- Determines if there’s any difference between two tables
- Provides flexibility for ignoring certain factors
- Validates the schema of a table
- Provides flexibility for handling extra columns
- Validates the schema of a JSON object
- Useful for testing workflows hosted as REST APIs
- Determines if two files are identical
- There are many other difference checker nodes (see screenshot below)
Now that we understand the purpose of testflows and have a better idea of what we’ll be testing for, let’s check out the two methodologies for creating testflows!
Integrated Deployment for ad-hoc testing
One simple and elegant way to test workflows (new in KNIME Analytics Platform 4.4.0+) is to utilize the Workflow Executor node, which can take in a workflow (or workflow segment) and execute it against some new data. Tests can be performed on the output, as demonstrated below.
In this example, we’re generating a workflow segment using the Integrated Deployment extension and are passing that workflow segment into the Workflow Executor node so that we can execute it against a new dataset. The Table Difference Checker node determines if the actual output matches the expected output. Since that node will raise an error if the actual vs. expected is a mismatch, this is effectively a testflow.
It’s worth mentioning that the Workflow Executor node is flexible enough to automatically adjust its input and output ports to match the requirements of the provided workflow or workflow segment!
If we have a workflow that has no inputs or outputs and simply want to test if the workflow can successfully execute (a.k.a. smoke testing), the example below would be perfectly valid!
The Integrated Deployment technique can be used with other nodes from the Integrated Deployment extension to build highly customizable test suites for workflows and components.
Here’s a slightly more complex example that demonstrates multiple tests being executed in the same workflow:
KNIME Testing Framework
The KNIME Testing Framework UI extension is the more feature-rich (but also more complex) method of creating testflows. It provides users the capability to create testflows which generate test reports. These test reports can be viewed inside of KNIME or integrated with a CICD pipeline.
It’s also possible to automate the execution of multiple testflows using the “testflow runner” inside the KNIME Testing Framework extension (see the Testflow Runner section below). Documentation for this can be found inside the plugin files of the KNIME Testing Framework extension.
Example testflow using the KNIME Testing Framework UI extension
In the example above we can see two simple test cases, one expected to succeed and one expected to fail. The “test” itself is performed by the Table Validator node each time to determine if the output of the preceding node matches expectations. The Testflow Configuration node is configured accordingly so that the testflow only returns a failure if the actual results don’t match expected results.
Testflow Configuration node
The Testflow Configuration node is the primary utility for configuring testflows. It allows us to define which components and/or nodes in the workflow are expected to fail or succeed when the workflow executes.
As you can see below, there are a number of configuration options available.
The “Node settings” tab in the configuration allows us to specify which nodes are expected to succeed or fail as well as allowing us to specify required warning messages.
Executing testflows
- Right click on a workflow (or an entire workflow group) from the KNIME Explorer and select Run as workflow test.
- Click OK.
- The testflow will then begin executing and the results will be displayed in the JUnit view.
- By browsing these results it’s possible to audit any failing nodes or unexpected behavior as well as the failure trace.
Components (and how they relate to testflows)
Considering that KNIME is a visual programming tool, it stands to reason that programming principles would generally apply to KNIME workflows.
One programming concept that can be applied to KNIME would be the FIRST principle (which stands for Focused, Independent, Reusable, Small, and Testable). Here’s a relevant quote from Addy Osmani regarding FIRST:
In fact, the secret to efficiently building ‘large’ things is generally to avoid building them in the first place. Instead, compose your large thing out of smaller, more focused pieces. This makes it easier to see how the small thing fits within the broader scope of your large thing. — Addy Osmani (source)
With regard to KNIME, “smaller, more focused pieces” means components. Components, if designed properly, adhere to the FIRST principle:.
- Components should be independently testable and should not rely on the workflows that use them in order to function properly. If a component can’t be reliably tested with a testflow, it likely isn’t adhering to the FIRST principle.
- Components should have a small set of documented inputs, data validation for the inputs, a single responsibility that the component will fulfill, and consistent outputs.
- Components should avoid dependencies on specific databases or other organization-specific resources and should instead be as independent and reusable as possible. One notable exception to this rule would be a component whose sole purpose is to connect to a database.
- Selecting half of the nodes in a workflow and creating a “component” out of them isn’t going to result in adherence to the FIRST principle. It’s likely that doing so would result in a component that fails to validate its inputs and likely has more than a single responsibility.
If teams focus on creating excellent components and adding test coverage to ensure the components remain consistent and functional, then the workflows which orchestrate those components become simple to understand and maintain.
Conclusion
There was a lot of material in this article (including some fairly advanced concepts in workflow development). Hopefully you learned something useful by reading it, and hopefully I’ve convinced you of the importance of testing your KNIME workflows and components. Please share your experiences with us on our Medium publication and on the KNIME Forum!
Example workflows: