Keep Calm and Run Your ML Projects the Lean Way
I decided to write this article in order to add some structure and visual representation of learnings, resources, ideas, and experiences I have picked up on my Data Science journey so far.
I believe it might be interesting to my fellow colleagues who are exploring different approaches of managing data projects, or maybe just looking for that one missing piece in their workflow. At the same time, my aim is to demystify the process for curious minds from different professional backgrounds who are interested in using data, ML, or AI to generate value for their business.
Finally, this article can serve as support in the effort to clearly communicate with my peers, collaborators, team, and wider organization when we reach the topics like workflow and ways of working.
Expect the Unexpected
Any kind of model is an abstract representation of reality, with no exception for machine learning models. It includes a set of assumptions and simplifications that aim to capture the essential features of a complex system or process while ignoring the irrelevant or secondary ones. Inherently, models can not fully capture all the details of the complex reality. For more thought’s on this I recommend reading The Map is Not the Teriotory.
In 2017, Steam’s recommendation system favored top-sellers and buried smaller indie games which ended up receiving fewer sales. It also assumed that users preferred games similar to the ones they had already purchased, while in fact, users appreciated being recommended games that were different from their purchase history.
Financial institutions developing credit scoring models based on historical customer data often overlooked that the data is skewed towards a certain demographic and isn't representative of the current customer base. This introduces bias into the model, leading to inaccurate predictions for all the customers.
Likewise, personalized pricing strategies applied by online retailers, such as Amazon, in some cases lead to regulatory scrutiny and legal consequences due to unfair pricing based on the customer’s browsing history, location, and other personal data.
Remember when Microsoft had to shut down its GPT Twitter bot after it started posting inappropriate content? Or when Meta’s image classification algorithm censured a prehistoric figurine as a nude?
My own recommendation system, based on customer sessions got “confused” after more than a year of stable performance on the Contorion product pages when the bot detection was surpassed and thousands of non-human sessions with unrelated products lead to creating associations between products that should not be associated according to the modeling assumptions. This led to a temporary reduction in the relevance of the displayed products and hours of investigation and data cleaning.
Having said that, we have to accept that implementing different kinds of models always involves some degree of uncertainty because even the most sophisticated models are based on assumptions and approximations, and there is always a possibility of features not being accounted for or even misinterpreted.
Therefore, managing uncertainty is an essential part of everyone involved in the decision-making process of creating a data product. One of the ways to confront the uncertainty is an approach that embraces a constantly evolving implementation that validates the assumptions incorporated into each iteration of the model and adapts the model based on the latest learnings, ideally keeping the iterations as short as possible.
The Lean movement
To make our lives easier in the constant battle with uncertainty and complexity it feels convenient to borrow some ideas from Erich Reis’s Lean Startup which builds up on learnings from the Lean Manufacturing and methods popularised by Steve Blank.
Build - Learn - Measure Feedback Loop
The Lean Startup methodology offers many useful concepts and ideas, but one of the most important for data scientists to understand and take advantage of is the Build-Measure-Learn feedback loop. The term loop is used to highlight the iterative nature of the process, where each iteration goes through the phases of building a solution, measuring its impact, and learning from the results in order to quickly define a new hypothesis for the next iteration.
The outcome of each iteration are the learnings, while each consequent iteration is defined by the learnings from the previous iteration.
Let’s go over an example:
Let’s say you’re building a data product that should help a grocery store chain with inventory management optimization.
Iteration 1:
Build: Predict demand for products based on customer buying patterns, supplier delivery times, and shelf stocking procedures.
Measure: Launch the product in a store and measure how well it predicts customer demand, and matches supplier delivery schedules.
Learn: The algorithm is overestimating or underestimating demand for certain products, especially around the weekend and it doesn’t account for product shelf life.
Iteration 2:
Build: Make adjustments to the algorithm, taking into account seasonal trends and supplier delivery times. Added a product shelf time estimation.
Measure: Launch the updated product in another store and measure the results.
Learn: You notice that the algorithm is still underestimating demand for certain products, but performing better on the majority of others. In the new store you notice that the customer demographics also play a role.
Next iteration …
It’s not uncommon for there to be debates about which phase from the figure above is more important (i.e. Learn versus Build). However, the key question is not which phase is more important, but rather how to move through each phase quickly, validate the assumptions and proceed to the next iteration which will be shaped by the latest learnings.
This is why a very important trait of a data scientist is the ability to break down complex problems into smaller and manageable hypotheses. Let’s talk about some tools that could be of help for this.
MVP
The idea behind MVP — Minimum Viable Product is to create the simplest possible version of a product in order to validate the idea on the end users.
Translated to Data Science lingo, deploy the simplest possible model (even heuristics are acceptable) using the least data as possible in order to validate the idea on a portion of end users. Keeping things simple enables us to validate our hypothesis quickly and learn about the limitations of the approach.
The complexity of the MVP of course depends on the circumstances in which it is being developed. Circumstances being the existing data infrastructure, the existing solutions to benchmark against, the way the model’s outputs are to be integrated to the business process, etc.
As the circumstances shape MVPs in different ways, not all of them require the same amount of time, effort and head-count. Finally, even the most lightweight MVPs can end up as a hypothesis that has been rejected, and with all the time saving compared to a “perfect” solution, they still required time and money. To address that, people came up with another shortcut — RAT.
RAT
RAT — Riskiest Assumption Test addresses the validation of the riskiest hypothesis, with the aim to test the assumption without deploying a system in production.
Both the MVP and the RAT go through the build-measure-learn loop with the same goal, but RAT gets there faster by using hacks and tricks that would not be sufficient for a production system.
In the RAT phase cutting corners is the main theme, and that means, heuristics are even preferred over ML models. No wonder that #1 rule of Google’s Machine Learning Rules states: Don’t be afraid to launch a product without machine learning.
Depending on circumstances, sometimes manually running a Jupyter notebook with a few, or even using the output from a Tableau worksheet will do the trick. When we talk about RAT — Learning is the main stage of the Build-Measure-Learn feedback loop.
The remaining question is, how do we structure all of this, and how can we easily explain it to our collaborators, stakeholders and clients?
Mercedes Decomposition
Ash Urazbaev introduced the “Mercedes” Decomposition, a visual framework that can serve as a powerful tool for teams and organizations to gain a broader perspective and identify potential blind spots when working on data products.
This framework breaks down the problem into multiple dimensions or angles, which enables the team to consider different circumstances and stakeholder needs.
In order to fully understand the problem that needs to be solved, in most cases, gathering ideas and questions from people with different perspectives is recommended. At the same time, collaboration and communication can be challenging due to the multidisciplinary nature of the subject. Stakeholders from different functions are used to different terminology, different associations, have different levels of exposure to a particular topic. However, the “Mercedes” Decomposition can help us facilitate effective collaboration because it enhances transparency and provides a solid foundation for clear communication to all participants in the discussion.
The figure below represents a blueprint of the Mercedes Decomposition, and on it, we can observe 3 key points:
1. The problem is decomposed into 3 dimensions:
- User Story
- Data
- Method2. It also breaks down the development process into stages compatible with the build-learn-measure iterative process and helps us visualise the evolution of potential solutions.
3. How the authors came up with the name 😆 (hint: Mercedes-Benz logo)
Let’s apply the decomposition
As the discussion begins, the participants can write the key ideas and questions onto sticky notes and place them on top of the blueprint provided above.
The majority of sticky notes can be considered as hypotheses, which are mapped to the three axes:
- Method: This axis accommodates the hypotheses about modeling methods — starting from the simplest method and progressing to the more complex methods. For instance, heuristics may go first, followed by linear models, and finally, neural networks.
- Data: This axis accommodates the hypotheses about the data to be used. Starting from the data available right away, then adding new sources in ascending order of data acquisition complexity.
- User Stories stay along the vertical axes. These parts of the project are directly related to software development necessary for the final result or functionality.
There are two more types of sticky notes:
- Questions. Any kind of unanswered questions that might arise. For instance, it may not be clear which data is available “for free” and which would require additional integration. The suggestion is to use sticky notes in different colors for this purpose.
- Technical Tasks. They may encompass infrastructure and architecture, for example, model-to-backend integration, HDFS deployment, and model launching infrastructure.
In the case of building a recommendation system for an E-commerce platform, the outcome might look similar to the following figure.
The example figure above is a "sneak peak" compared to a decomposition on a real project. For a more detailed examination of the implementation process (one phase at a time, starting from the RAT phase) I suggest reading my upcoming articles.
In the following articles of this series, I will demonstrate how to use the aforementioned tools and concepts with practical examples of building a product recommendation system for newsletter subscribers. Each development phase will be covered in a separate article.
🔽 Start with the Riskiest Assumption Test — RAT
🔽 Proceed with the Minimum Viable Product — MVP
To get notified about upcoming articles hit the follow or subscribe ❤️