The 4 biggest problems I met while building Machine Learning Products

7 min readAug 9, 2021

I have worked as a Data Scientist in several startups and tech companies and I had the chance to build many Machine Learning(ML) products. I found this fascinating as building great ML products needs both technically advanced and multidisciplinary work. My goal today is not to talk about the technical challenges I've faced, but the product challenges that I've observed in many to be critical in building ambitious and performing ML products. Let me explain the 5 biggest problems I met while building those products:

1. "The 100% performance fallacy" in product design

A classic ML feature design flaw is to design a feature with a perfect model in mind. By “perfect” here I mean a model which is always right or better than human performance.

To be clear, some models can reach such quality (generic model such as detecting an object in a picture, etc.). However, designing with such models in mind is:

Dangerous: Model weaknesses is fundamental to the product design work. No model is perfect and most will output some level of wrong/aberrant results. This leads to both design inefficiencies and under-investment on quality work. As an illustration concerning design, lots can be done to build features that derive value from the strengths of the algorithm without suffering too much from its weaknesses: wording adjustment, possibility for the user to remove problematic results, adjusting the UX depending on each output confidence level, etc.
Restrictive: Some powerful features can be done out of 60–70% (or above average) model (A stock trading model with a 51% performance is enough to make you rich). Not taking advantage of these models is a major loss of innovation capacity.
Un-collaborative: With perfect models as baseline, Data scientists may be seen as the only one responsible for a feature quality (which is completely wrong as the product use cases and designs plays a huge role in the product quality). This create un-collaborative organisations with the Product team pressuring data scientists to build perfect models while those grow defensive and tries to push them back out of the model discussion.

2. A blind faith in quantitative feedback

It is well known that an ML feature needs feedback to improve. However it’s not magical: one of the most problematic sentences I keep hearing is “The feature is not performing well right now, but we expect those issues to disappear once we got more usage”. Why? because it’s usually blind faith.

Before it is used for training, Quantitative feedback are built for measurement. It allows the analysis of a model performance to make sure that we keep improving. Blind training can be dangerous as:

The quantity may be lacking. use cases that produce billions of clear and unambiguous feedback are the exception and not the norm. (Especially in a B2B context with strong customer context)
Feedback is incomplete and ambiguous. We measure what people do but can’t know why they are doing it(eg: if a Netflix user doesn’t click on a movie, it can be because of unattractive content and not a bad suggestion). Similarly, you may know something is bad but usually not how bad it is; (eg: Let's say a suggested content is not chosen by user, how does it impact the user perception? Is it an aberrant suggestion? is it annoying on the long run?)
Many problems can’t be fixed with feedback. We can’t know how much a model will improve and if this improvement will really improve the customer perception. As an example, having only partial data to input the model can’t be fixed with more feedback. An other interesting example is UX problems. I met many cases where the best action to improve the product wasn’t even changing the model, but updating the UX designs.

In my experiences, those are not details but major concerns. It’s why I always complete a ML feature analysis with qualitative feedback from customers: How good does it seem to be in the field? Are the users happy with it? Is there specific issues/disappointments? With such analysis, we can focus on fixing customer issues while using quantitative data to ensure we maintain (hopefully improve) our quantitative performance.

3. Only applying "traditional" Quality Analysis (QA) to ML features

Usually organisations doesn’t challenge their QA processes and limit the work on an ML feature to bug/aberrant issue search by the QA team & quantitative feedback by the data science teams.

The QA process needs to be redefined in a ML product context as on top of bugs and aberrant results, we need to focus on performance improvement and patterns of under-performance. Also, we can’t easily know which team should work on a quality problem and if such a work will take days, months or even be possible to fix. There is many ways to add value through ML QA:

As stated earlier, we can have some level of qualitative analysis of the performance
Even before being live in production, we can analyse a feature's output in real (or real-like) conditions in search for "under-performance patterns": is there some customers, demographics, population dimension on which we can see quantitative or qualitative problem that we should focus on ?
We can build “validation dataset”: scenarios with expected results to replay on each model version to validate quality. As opposed to massive training dataset, those are small but the model performance on it is analysed with the highest scrutiny ( in some case, we may reach for a 100% performance goal). Such dataset can also grow with customer feedback to ensure non regression on past qualitative feedback as we don't want to see a fixed problem reappear in an ulterior model version.
As we can't easily know what should fix a performance problem (a model update ? a UX change? a new product? some complementary data? user communication?). It’s important to share such problems transversally and define together what should be the action plans to improve your feature quality.

4. Lack of Product/Data teams collaboration

a. a strong segregation of duties

Splitting up teams is a necessary evil in companies and Data Scientists are usually in distinct teams from Product Managers. This makes sense but doesn't help their needed close collaboration to build products. The reality of being a data scientist building models is that it contains a strong part of product work as their daily decisions affect the detail of what the product does.

A common phenomenon is therefore a “silo-effect” that makes a team of data scientists the core responsible of a ML feature quality, whereas the success of machine learning products relies deeply on how transversal and collaborative the effort is.

b. Some unadapted methodologies

In most organisations I've met, the product design was sequential: 1. Product Managers define the product goals 2. UX designers design the product 3. Engineers build it 4. QA happens. This can be problematic as:

Product Manager needs the input of Data Scientists to get a precise sense of the technological possibilities to input in their exploratory work.
Product Manager and UX designer need a precise knowledge of the algorithm to design the product: What are its detailed strength and weaknesses ? In order to achieve this, it may be worth investing in Building Proofs Of Concepts(POC) to support the design work
QA needs to be re-designed (cf. previously).
Most of the AI product time should be dedicated to quality improvement : Measuring/Analysing/ the current product quality and iterating on product issues. Building a first feature version is the starting point, not the whole project or half of it.

Concerning Agile methodologies… those are great (especially as they push for a more iterative, incremental and adaptive product building), but were never meant for ML product needs and fail to support them. ML needs organisations to redefine their design and quality methodologies. I've have seen cases of blinding applying a more directive agile methodology on ML products. It usually doesn't meet the reality of ML products needs and mostly develop team frustration.

Few tips

Before I leave you, here are few tips that can help to improve your organisation ability to build ML products

Move your data scientist close to their product counterparts
Invest in the adaptation of the product design methodology to ML products
Find some way to experiment with real-life behaviour of the AI. If the application development takes times, It is worth investing in real life approximation (simplified UI, frozen/fake dataset…) to be able to analyse your AI feature performance do changes as soon as possible
Don’t rely too much on internal algorithm metrics. They are useful but only indicatives. They are not business KPIs and shouldn’t be trusted to predict a feature performance. Even a world-class algorithm can have huge problem if you move it from the exact task it was trained on. Again, there is a huge gap between great models and great features.
Start simple and avoid advanced models for the v1. Starting with simple solution and analysing how it really matches the related product need is a healthier way of building a product.
Create a transversal process dedicated to ML QA and related action plans
Always keep in mind that experimentation beats theory. we can’t know how a ML model will perform in a product before we try it in real life conditions. only the real-life product behaviours counts.

Machine Learning features are like startups: You can have a great idea, think about for years, build a detailed business plan, have a great UX… all of this is worthless if nobody buys it/uses it.