Sitemap
TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

MLOps

Organizing a Machine Learning Monorepo with Pants

Streamline your ML workflow management

20 min readAug 18, 2023

--

Press enter or click to view image in full size

Have you ever copy-pasted chunks of utility code between projects, resulting in multiple versions of the same code living in different repositories? Or, perhaps, you had to make pull requests to tens of projects after the name of the GCP bucket in which you store your data was updated?

Situations described above arise way too often in ML teams, and their consequences vary from a single developer’s annoyance to the team’s inability to ship their code as needed. Luckily, there’s a remedy.

Let’s dive into the world of monorepos, an architecture widely adopted in major tech companies like Google, and how they can enhance your ML workflows. A monorepo offers a plethora of advantages which, despite some drawbacks, make it a compelling choice for managing complex machine learning ecosystems.

We will briefly debate monorepos’ merits and demerits, examine why it’s an excellent architecture choice for machine learning teams, and peek into how Big Tech is using it. Finally, we’ll see how to harness the power of the Pants build system to organize your machine learning monorepo into a robust CI/CD build system.

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Michał Oleszak
Michał Oleszak

Written by Michał Oleszak

Lead ML Engineer | Top Writer in AI & Statistics | michaloleszak.com | Book 1:1 @ stan.store/michaloleszak

Responses (2)