Accelerating the path to production for AI

Kedro from QuantumBlack, AI by McKinsey

--

Organizations face significant challenges when scaling their AI efforts beyond experiments and proof-of-concept models. This article examines how Kedro, an open-source Python framework created by the QuantumBlack Labs team to create reproducible, maintainable, and modular code, can help. Kedro has almost ~17M downloads and 10K stars on GitHub to date. It is used in many different fields. Developers from over 250 different companies have worked with Kedro as super-users in the last year.

User exploring an AI pipeline, using Kedro-Viz, created by QuantumBlack Labs

QuantumBlack, AI by McKinsey helps organizations achieve accelerated, sustainable, and inclusive growth with AI. QuantumBlack Labs is the R&D and software development hub within QuantumBlack. We use our colleagues’ collective experience to develop suites of tools and assets that ensure AI/ML models reach production and achieve sustained impact. In the coming months, we will publish a series about the technology challenges behind digital and AI transformations, and the solutions required.

Projects using artificial intelligence and machine learning (AI/ML) are a key driver of business value. Companies have shifted from exploring what the technology can do to exploiting it at scale to gain market share. But the complexity involved in building, deploying, and managing AI/ML at scale should not be underestimated. In this article, we consider why projects often fail to transition from prototype to production: technical debt.

Technical debt accrues when teams develop AI/ML models without following best practices. It can significantly hinder scaling a prototype into production.

Best practices promote scale

As our recent book, Rewired: The McKinsey Guide to Outcompeting in the Age of Digital and AI (Wiley, June 2023) describes, ‘technical debt is the “tax” companies pay on any development to redress technology issues’. Technical debt results in models which are inefficient and unreliable for production use, hard to maintain and difficult to integrate.

Adhering to best practices in AI/ML development avoids technical debt and ensures that models scale efficiently. Examples of these best practices include:

  • Optimized data management: for reproducibility
  • Decoupled configuration and code: for reusability
  • Testing across the codebase: for maintainability
  • Version control with CI/CD: for reliability

So why is it difficult for data practitioners to follow best practices when building prototype projects? The challenges can be broadly categorized into several areas:

  • Lack of clearly-defined protocols: Many organizations lack awareness, or teams lack capacity, and fail to define the standards needed.
  • Resource constraints: The tools to ensure use of best practices may not be readily available.
  • Insufficient collaboration and mentorship: Senior staff are not available to on-board junior practitioners in the standards they should adopt.

Kedro

Kedro was created to streamline the development of data and machine learning within QuantumBlack. It is an open-source Python framework designed to create reproducible, maintainable, and modular data science components. Kedro was created within QuantumBlack Labs. In 2022, McKinsey donated Kedro to the Linux Foundation, AI & Data, where it is currently an incubating project.

When we introduced Kedro to the world as an open-source project in 2019, we described this scenario, which is probably familiar to many data scientists:

Suppose you are a data scientist working for a senior executive who makes key financial decisions for your company. She asks you to provide an ad-hoc analysis, and when you do, she thanks you for delivering useful insights. Great!

Three months later, the newly promoted executive, now your CEO, asks you to re-run the analysis for the next planning meeting…and you cannot. The code is broken because you’ve changed some cells of the Notebook, and you cannot remember the exact environment you used at the time. All the file paths are hard-coded, so you have to go through laboriously, then check and change each one for the new data inputs. Not so great!

To avoid this kind of scenario, Kedro promotes well-established best practices and serves as a tutor that unites teams along a shared common set of standards for project structure, configuration, and data management.

  • Kedro encourages coding in Python scripts for easy versioning.
  • It uses a straightforward project structure for collaborative working with tooling options to document, redistribute, and deploy the code.
  • Through a hierarchy of nodes and pipelines, Kedro offers built-in modularity, enabling teams to break down complex data science workflows into manageable, shareable packages.
  • A new developer can pick up an existing Kedro project, or individual pipeline, and rapidly understand how to use and modify it.
  • Kedro’s companion package, Kedro-Viz, offers interactive project visualization to facilitate development and communication.

Impact of Kedro

“Before Kedro, we had many notebooks in different versions in different files and directories. Everything was scattered…We are 18 times faster than our old process while supporting more marketplaces!”

Principal ML Engineer at Jungle Scout

In a recent case study, a Brazilian broker used Kedro to standardize how they deploy models in production, reducing deployment times from 2 weeks to 2 days. They currently use Kedro in 150 projects (90 of which are in production).

Does my team need Kedro?

Teams that are dealing with technical debt see their codebase become increasingly complex and fragile, with symptoms such as the following:

  • Frequent failures and performance issues
  • Code is difficult to maintain and scale
  • Frustration over lengthy development cycles and reduced productivity
  • Ongoing costs for troubleshooting and maintenance
  • Limited time for innovation and new projects.

Summary

At McKinsey, we have seen first-hand that moving AI solutions from idea to implementation can be challenging. We recognized that scaling an AI/ML project relies on being able to transition a prototype solution into production swiftly without significant re-engineering, which increases both time-to-value and engineering overheads.

Using code best-practices from the outset is a strategic move towards reducing technical debt and building sustainable and successful AI/ML initiatives.

  • Individuals joining a team that follows Kedro best practices are onboarded swiftly, learn, and work independently, and are more productive.
  • Teams shift their focus towards collaboration, outputting consistent and maintainable projects, before reusing Kedro pipelines and skillsets in future development efforts.
  • Organizations benefit from increased productivity, cost savings, and satisfaction.

In our next article, we’ll describe Brix, another QB Labs tool to enable teams to reap the benefits of Kedro by discovering reusable pipeline assets that can be recombined into different use cases.

QuantumBlack Horizon is a family of enterprise AI products, including Kedro, Brix, Alloy and Iguazio, that provides the foundations for organization-level AI adoption by addressing pain points like scaling. It’s a first-of-its-kind product suite that helps McKinsey clients discover, assemble, tailor, and orchestrate AI projects.

To learn more about what QuantumBlack Horizon can do for you, please email Yetunde Dada.

--

--

QuantumBlack, AI by McKinsey
QuantumBlack, AI by McKinsey

We are the AI arm of McKinsey & Company. We are a global community of technical & business experts, and we thrive on using AI to tackle complex problems.