This article outlines how data visualisation enables effective conversations between business stakeholders, data scientists and data engineers when solving complex Machine Learning (ML) workflows. We will introduce Kedro-Viz, an open-source data pipeline visualisation tool, exploring its functionality and detailing how QuantumBlack deployed Front-End Engineering to create our latest tool.
You have spent weeks trying to locate and access to the data sources required to solve your ML use case, and are now beginning to preprocess these raw datasets. You now face the time consuming prospect of converting column types, cleansing, transformation and wrangling before you even begin to consider feature engineering.
Your business stakeholders have already queried why the prototype ML model was not completed in Week One. How can you explain how extensive the data engineering process is without showing your codebase?
Vicki Boykis faced a similar challenge when she posted the poll, attempting to understand where time is allocated in data science projects:
Change the conversation
Presenting the data pipeline construction story is a common challenge faced by data teams around the world. This is not surprising, given how difficult it can be to communicate to non-expert stakeholders the reasoning behind selecting certain data sources and ML models.
We believe that magic happens when business stakeholders, data engineers and data scientists speak the same language. We developed Kedro-Viz to solve this problem.
Kedro-Viz shows you how your Kedro data pipelines are structured. With Kedro-Viz you can: See how your datasets and…
Introduction to Kedro-Viz
Kedro-Viz displays data pipelines in an informative way, emphasising the connections between datasets and tasks. Kedro-Viz, available as a Python plugin, will show you how your
Kedro pipeline is resolved. It provides a bird’s-eye view of complex workflows and can validate the work that has gone into producing the data preprocessing pipeline. In effect, Kedro-Viz offers a bridge of understanding to explain how different sections of your pipeline fit together. Additionally, it provides a variety of filters to help you explore relevant parts of your pipeline.
With Kedro-Viz, you can:
- Produce high-level and low-level data pipelines by toggling the visibility of sub-pipelines (using tags), individual datasets, tasks and parameters
- Show or hide dataset or task names
- Visualise your workflow with a light or dark theme (by popular request)
- Use it as a standalone HTML page when installed as a Kedro plugin
The below forms a continuation of Kedro’s Guide To Getting Started. Prior to activating Kedro-Viz, you should have:
- Downloaded the Spaceflights GitHub repository
- Created and activated your virtual environment and installed
- Made your project directory available to Kedro-Viz
To install it:
pip install kedro-viz
From your terminal, run:
This command will run a server on
http://127.0.0.1:4141and will open up your visualisation on a browser. You should be able to see the following:
In conversation with Richard Westenra, creator of Kedro-Viz
- Paths often cross each other in sub-optimal ways
- It does not like circular nodes
- It is often slow to generate larger graphs
In future, I would like to find or build a new layout engine for our graph — that being said, Dagre works well enough for now.
I use D3 for rendering and transitions on the chart itself. D3’s most attractive feature is the control and flexibility it offers, especially for entry/exit animations. However, it can prove difficult to write unit tests for, so in future we may consider deploying React for chart rendering. At present, React renders everything which is not part of the SVG chart itself. Most of the sidebar interface elements are imported from Kedro-UI, our reusable UI component library which was recently open-sourced.
This repo represents a set of UI components that we use in our internal products and applications. It allows us to move…
The application state handling began simply and has become more complex over time. I avoided Redux for as long as possible to prevent adding unnecessary complexity. However, once we added sub-pipeline tags, the various filters became too much to handle. I had to completely refactor it to use Redux, Reselect and normalised data. Reselect selectors have been helpful in making the more complex logic much more explicit and removing a frequent source of bugs.
I am immensely proud of the result and tremendously excited for future progress in Kedro-Viz. I believe these adaptions will power more creative features in months to come, such as user capability to configure the font size.
You can learn more about Kedro-Viz and Kedro-UI via the below links: