Demystifying Machine Learning Complexity Through Visualisation

Yetunde Dada, Product Manager, Richard Westenra, Front-End Engineer

This article outlines how data visualisation enables effective conversations between business stakeholders, data scientists and data engineers when solving complex Machine Learning (ML) workflows. We will introduce Kedro-Viz, an open-source data pipeline visualisation tool, exploring its functionality and detailing how QuantumBlack deployed Front-End Engineering to create our latest tool.

The Scenario

You have spent weeks trying to locate and access to the data sources required to solve your ML use case, and are now beginning to preprocess these raw datasets. You now face the time consuming prospect of converting column types, cleansing, transformation and wrangling before you even begin to consider feature engineering.

Your business stakeholders have already queried why the prototype ML model was not completed in Week One. How can you explain how extensive the data engineering process is without showing your codebase?

Vicki Boykis faced a similar challenge when she posted the poll, attempting to understand where time is allocated in data science projects:

Change the conversation

Presenting the data pipeline construction story is a common challenge faced by data teams around the world. This is not surprising, given how difficult it can be to communicate to non-expert stakeholders the reasoning behind selecting certain data sources and ML models.

We believe that magic happens when business stakeholders, data engineers and data scientists speak the same language. We developed Kedro-Viz to solve this problem.

Introduction to Kedro-Viz

Kedro-Viz displays data pipelines in an informative way, emphasising the connections between datasets and tasks. Kedro-Viz, available as a Python plugin, will show you how your Kedro pipeline is resolved. It provides a bird’s-eye view of complex workflows and can validate the work that has gone into producing the data preprocessing pipeline. In effect, Kedro-Viz offers a bridge of understanding to explain how different sections of your pipeline fit together. Additionally, it provides a variety of filters to help you explore relevant parts of your pipeline.

Kedro-Viz demo available here

With Kedro-Viz, you can:

  • Produce high-level and low-level data pipelines by toggling the visibility of sub-pipelines (using tags), individual datasets, tasks and parameters
  • Show or hide dataset or task names
  • Visualise your workflow with a light or dark theme (by popular request)
  • Use it as a standalone HTML page when installed as a Kedro plugin
  • Or, it import Kedro-Viz into a React webapp as a JavaScript component

Using Kedro-Viz

The below forms a continuation of Kedro’s Guide To Getting Started. Prior to activating Kedro-Viz, you should have:

Prerequisites

You have:

  • Downloaded the Spaceflights GitHub repository
  • Created and activated your virtual environment and installed Kedro
  • Made your project directory available to Kedro-Viz

Installation

To install it:

pip install kedro-viz

Run

From your terminal, run:

kedro viz

This command will run a server on http://127.0.0.1:4141and will open up your visualisation on a browser. You should be able to see the following:

The Kedro-Viz pipeline for the Spaceflights tutorial

Developing Kedro-Viz

In conversation with Richard Westenra, creator of Kedro-Viz

Kedro-Viz began as an experimental prototype that I bootstrapped with Create-React-App. I attempted a few different layouts and visual styles for the chart itself and briefly toyed with using a D3 force-directed network split into different layers. I then discovered Dagre, a directed acyclic graph layout library written in JavaScript. Dagre handles the positioning of graph nodes and edges. It is not perfect — issues include:

  • Paths often cross each other in sub-optimal ways
  • It does not like circular nodes
  • It is often slow to generate larger graphs

In future, I would like to find or build a new layout engine for our graph — that being said, Dagre works well enough for now.

I use D3 for rendering and transitions on the chart itself. D3’s most attractive feature is the control and flexibility it offers, especially for entry/exit animations. However, it can prove difficult to write unit tests for, so in future we may consider deploying React for chart rendering. At present, React renders everything which is not part of the SVG chart itself. Most of the sidebar interface elements are imported from Kedro-UI, our reusable UI component library which was recently open-sourced.

The application state handling began simply and has become more complex over time. I avoided Redux for as long as possible to prevent adding unnecessary complexity. However, once we added sub-pipeline tags, the various filters became too much to handle. I had to completely refactor it to use Redux, Reselect and normalised data. Reselect selectors have been helpful in making the more complex logic much more explicit and removing a frequent source of bugs.

It was quite challenging to implement Kedro-Viz as a module inside other applications for internal use. In order to avoid forking the code-base, which would cause future complications, I modified the tooling configuration. In addition to building a static webpage, it could also be published as an npm package, and imported into other JavaScript projects as a standalone React component. This was a time-consuming process that required much of the CSS in Kedro-Viz and Kedro-UI to be refactored in order to allow importation into an existing website without affecting global styles.

I am immensely proud of the result and tremendously excited for future progress in Kedro-Viz. I believe these adaptions will power more creative features in months to come, such as user capability to configure the font size.

You can learn more about Kedro-Viz and Kedro-UI via the below links:

--

--

QuantumBlack, AI by McKinsey
QuantumBlack, AI by McKinsey

We are the AI arm of McKinsey & Company. We are a global community of technical & business experts, and we thrive on using AI to tackle complex problems.