Kedro Meets Team Topologies: A Practical Guide to ML Platform Teams

Carlos Barreto
5 min readApr 2, 2024

--

Creating data platforms, that's supports a wide variety of use-cases and profiles of data consumers, is a challenging task to any platform team. In this post I'll share how Kedro can help teams throughout this process, reducing learning curve and enabling data science squads extend and adapt it as needed.

Platform teams short overview

A platform team is responsible to define patterns and practices to support consumers of the platform develop and deploy their solutions, reducing development time, with substancial autonomy and ownership to deploy it in production. Platform Teams are a famous convention from the book Team Topologies (Matthew Skelton & Manuel Pais), alongside with other 3: Stream-Aligned teams, Enabling Teams and Complicated Sub-System Teams, and taking a definition from Evan Bottchers from the book, it says:

A digital platform team is a foundation of self-service APIs, tools, services, knowledge and support witch are arranged as a compelling internal product. Autonomous delivery teams can make use of the platform to deliver product features at a higher pace, with reduced coordination.

The creation of a platform, commonly on top of another platform (e.g.: Cloud or Kubernetes), is an important step to any company, doesn't matter if you are a startup or not. Providing safe patterns to data consumers and ensuring a reliable execution environment, enable teams to focus to create and capture value.

A common challenge when working with IA in general is: how can I deploy this data pipeline/ml model I created without breaking everything or having to open a ticket that will take at least 4 weeks to be implemented (commonly by a team that isn't closer to us or even know why we are creating the solution)

Deploying a model requires different layers to support it, from data ingestion to model traceability — creating the MLOps undercurrent. But in a simple way, we want to move our code from development (sandbox) to production, without many challenges and refactoring.

When I started my career as a Data Engineer a common scenario was the Data Science team working in the version 5 of a model, and in production is running the version 1, because a developer had to translate a random forest algorithm from R to Java, due application restrictions in production. I didn’t realized at the time but its a classical bad configuration of MLOps, with DS and OPs teams distant, without communication patterns.

Platform teams can offer common ways to monitor and deploy ML solutions, using approaches like containers, logging and Data API contracts. Keeping those contracts easy to digest to users, reducing onboard and understanding of usage, is achieved (not only) but keeping an updated documentation with latest changes and requirements. During the platform definitions is important to balance between the desire to build from scratch or adopt some open-source/proprietary tool.

Kedro as a piece of a ML Platform

We have many posts/materials about Kedro over the internet, like this one, or this official tutorial, or even myself on Youtube. So I will not cover in this post what Kedro is or isn’t. In fact, I want to focus on the practical things of the framework and how it can help ML Platform Teams, to provide it as a potential tool to enable Data Science teams for fast flow of development and value capture.

Abstracting data integration and structure patterns

Accessing data storage is one of the challenges teams can have. In Kedro we have an easy way to do it via catalog. Teams can manage a central catalog, with all data available, and managed on GitHub, so all users can consume it. Is possible to integrate it with a CI/CD process, in order to validate paths, latest update of data, data-drift and many others.

To speed-up the development, user can them have access to templates/starters in order to run it. Based on team feedbacks, platform teams can provide some Kedro starters, so teams can have a common way to init the development of their AI solutions (being able to extend and adapt as needed).

In a more mature scenarios we can have catalogs being created dynamically based on user data governance permissions and access control. As example, users can have something similar to Spring Boot Initialzr to start new projects, using base configurations, choosing custom packages, integrated with data governance platform showing the data they can use, for example.

Sprint Boot Initializr

Masking dependencies complexity

Part of a team autonomy is in define tools/services (without putting in risk the platform) to use. In ML this is common related to packages to run a model, new versions of pandas with fast interactions, or any other requirement. One way to encapsulate those dependencies, reducing errors in other services, is managing it via containers. Kedro is easily integrate with Docker, via Kedro Docker plugin. Again, this can be part of a CI/CD verification, to ensure users are following the base platform alignments.

Keeping documentation up to date

When developing custom applications, one important task — and time consuming — is to create documentation. GenAI is here to help us already.

Maintain a semantic versioning contract, compatibility between each of versions deployed and keep all documentation updated, is a heavy task. According to Code Time Report, developers can take in average 40 mins per day, working in tasks like documentation, reviewing code and PR review.

Kedro offers an updated and easy to follow documentation, thanks to the community and efforts of maintainers. ML teams can focus writing documentation about what's important on their project and decisions made, instead of how the custom framework created works. This can reduce the learning curve (as Kedro use common approaches like OOP, Dependency Injection, Facade, Observer and many others), keeping easier to find the base knowledge of the framework. Platform Teams can also extend and implement custom versions of it, with their extended datasets, hooks or any other integration created.

Adapting and extending

Kedro offer an easy to extend API, with Datasets and Hooks, keeping it easy to data teams, outside the core platform definitions, implement customized requirements, that adapt to their current needs, without creating heavily breaking changes. Sometimes those adaptions can be helpful to others also, and became part of the platform.

Wrapping up

Not a technical requirement, but creating this communication flow between teams consuming platform and teams creating the platform, are really important. Constant feedbacks of what’s working well and not, can help develop better products and solutions.

As you can see, Kedro is a complete framework and can help ML platform teams using it at scale to develop ML products.

Share in the comments session, what are the frameworks you are using inside your platform and how are you organizing it? Any other frameworks or approaches to enable data teams develop with efficiency and security?

Also posted in Kedro blog.

--

--