ML leadership: Patterns & strategies — Flo Health | by Demetris Papadopoulos | Flo Health UK

Demetris Papadopoulos

Published in

Flo Health UK

10 min readFeb 3, 2023

ML leadership: Patterns & strategies — Flo Health

“Artificial intelligence is the new electricity.” — Andrew Ng

“Machine intelligence is the last invention that humanity will ever need to make.” — Nick Bostrom

We’ve all heard these quotes and then some. And as an AI practitioner myself, how can I disagree with their substance? The power that can be harnessed from modern machine learning truly has the potential to reshape the world. Just when people were starting to wonder whether we’ve reached a new plateau, AI art generators, such as Dall-E and Midjourney, and chat tools, such as OpenAI’s chatbot ChatGPT, emerged to showcase that we’re not going to stop getting awestruck anytime soon.

The question then becomes, how do the technologies above manifest and influence companies that aspire to be data driven, and how do those companies exploit such technology to digitally transform and reposition themselves in the market? Sadly, the extravagant publicity of such advancements and the great enthusiasm they inspire tend to overwhelm even the most prudent and conservative stakeholders and decision-makers. Based on empirical evidence from conversations I’ve been having with many ML-focused colleagues from various companies and industries, as well as online stats and articles, it seems the majority of companies fail to embrace or utilise machine learning efficiently or meaningfully.

Back in 2020, VentureBeat reported that approximately 90% of ML models never make it into production. That sounds terrible, doesn’t it? All those resources funnelled to train those models, only for them to have no real business impact. How much might things have improved over the past two years? Sadly, KDNuggets reported in 2022 that “the majority of data scientists say that only 0 to 20% of models generated to be deployed have gotten there.”

Bear with me, but I am afraid it gets even worse! As mind blowing and heartbreaking as it is that so few models ever make it to production, there’s also a case to be made about those that do get deployed. Are they the right ones for the purpose they were built? And are their predictions actually utilised properly? Are they being closely monitored for drift? And is their impact on actual business metrics also constantly under review? It is my understanding that not all companies get that right either. And that’s not the fault of data scientists or machine learning engineers; it usually signals a lack of mature ML leadership.

Is it, then, any wonder that data science has started to become deprioritized? There are new reports online talking about the likelihood of an upcoming AI winter, such as a recent article by professor Wouter van Heeswijk. The professor talks about decreased funding, data scientists getting fired, and disappointment due to inflated expectations. Is data science not the promised land everyone thought it was going to be? My opinion is that the potential is indeed still there. The key is having the right strategy to realise it.

I would like to offer my thoughts on what I perceive as the proper way to do machine learning in the industry — how to do machine learning pragmatically and impactfully. It’s certainly the way we aim to do things at Flo Health, and though we may not always be perfect, our results so far show that we’re on the right path. It’s not a complicated strategy either. The premise is simple: It all starts with the business. Machine learning is a potentially powerful tool, but not a panacea for every problem/objective.

It is my strong belief that the best way to illustrate and explain is always through examples and counterexamples. That’s how I will attempt to approach this. Furthermore, although ML models can cover a wide range of domains, from medical use cases to machine vision, I’ll limit my examples to those in the commercial/marketing sector, as it may be the one with the greatest overlap amongst companies of different industries, particularly product organisations, such as Flo Health. That having been said, which patterns lead to the failures outlined above?

Lack of synchronization between the stakeholders & the DS department

The first and most basic case is when the data science department is operating with an unhealthy level of autonomy. In such scenarios, there is a fundamental disconnect between the data science department and the business stakeholders. A proper roadmap for the ML use cases is mostly absent. Worse, when a roadmap is present, it misses the company’s main problems and goals, as those would have been pinpointed by the true experts — the stakeholders. It is mostly left to the data scientists/data engineers to produce “useful” data products, and though nobody should question the brilliant minds of my fellow practitioners, I’d be the first to question the depth of their connection with the business model of the company they work for and its true priorities.

There will be ideas that are spot on. “Let’s build a churn model!” Not bad; a churn model is the holy grail of models for most organisations, isn’t it? And yet, the fact that this is disjointed from the stakeholders’ wishes and immediate plans means that its true value will remain limited. What good are those churn predictions if they won’t be materially exploited to serve the business in an apt way? Because that’s what will happen if the stakeholders have their sights set on something else instead (e.g., that new campaign they want to launch that couldn’t be further away from being legitimately data driven). This pattern can be spotted in companies where the data science department is expected to carry its weight and impact business by itself. Unfortunately, it won’t.

In slightly better cases, there is basic sync, but it’s not deep enough to measure the true effect of the ML models that are being deployed. Consider this example: a propensity to purchase model. It sounds extremely useful, doesn’t it? It could be built for cross-sell purposes (e.g., which current customers might be more likely to also purchase this additional service/product?). It could also be for activation purposes in companies with freemium business models (e.g., which current nonpaying customers are most likely to subscribe?).

Let’s say the user acquisition department of our fictional company plans a campaign to boost sales. Synced with the user acquisition stakeholders, the data scientists who were considering building a propensity to purchase model proceed to do so. The final model produces probabilistic scores to indicate the likelihood of a customer to purchase. The campaign cannot run on every single user; that’s too expensive. The model’s scores help to choose which customers should be targeted by the campaign. The campaign is thus more effective because the people it has targeted are the ones more likely to purchase. Considering this use case, it’s certainly a step forward — an ML product impacting a campaign and maximising acquisition and revenue. But is it really as adequate or as good as I’ve just made it sound? Unfortunately, the answer most often is no. Allow me to explain.

In the example I’ve offered, the model that was built does not exactly predict the effect of the “treatment.” And by treatment, I am referring to the interaction with the user, the push notification, the email, or however it is that this hypothetical company will approach and try to push for a purchase event. One should expect that targeting the people who are most likely to purchase will yield a better conversion rate than random choice, and that much will probably be true. But this will hardly be as optimal nor as powerful of an effect as it could be. Some customers who might have otherwise purchased may perceive this interaction as a “disturbance,” causing them to reconsider. And customers who could have purchased if they were pushed might not be approached at all because they were considered to have a low probability to purchase to begin with. In most cases though, these negative effects are never captured. A fundamental A/B test could tell a lot about the model’s influence on the campaign, which might be a lot less significant than anticipated. Yet very few organisations go the extra mile to measure such important statistics.

It is not within the scope of this article to explain in technical depth the appropriate models that could have been used for such a use case, but to give the reader a taste, I will mention concepts such as uplift modelling and reinforcement learning with feedback loops for general next best action use cases. The key point here is that when the connection between the data science organisation and the business stakeholders is loose, the tool will not always be the best fit for the task at hand. And although you can use a knife to tighten a screw, a screwdriver will always serve you better.

ML engineers & scientists not participating in shaping the roadmap

“All in good measure,” said the ancient Greeks, and this is true here as well. Not involving the stakeholders enough must not be countered by having them in absolute command instead. A structure in which the data scientists and the engineers are not adequately involved in idea generation and the business stakeholders are making every call on their own is a recipe for disaster. The stakeholders might understand the problems that need to be solved better than the data scientists, but they do not understand enough about the machine-learning tools that could be used to solve them. One way I’ve seen this manifest is a susceptibility to purchasing different fancy, boxed commercial ML tools and platforms that the data scientists are then called to work with. Often, multiple such tools are purchased redundantly, as their functionalities usually have major overlap. Consultants from those platforms are then brought in to help the data scientists and the data engineers “connect the pipes” so the “spaceship” can work. What the stakeholders discover eventually is that the effort required for this to happen is almost as much as what they’d have needed to put in creating their own platform originally. Sure, some time is saved, but the sacrifice is usually great. These “spaceships” don’t come cheap.

Never forget that Data Science products are still software products

A last counterexample, one in which I will finally move away from the relationship with the stakeholders and dive deeper into the data science organisation itself, has to do with the composition of the department. The point I want to make here has to do with the balance that must be achieved between the scientific/statistical talent within the teams, as well as the backend/engineering capabilities to complement and support it. I’ve been told of and seen too many examples in which companies invest purely in great scientific minds — people coming from academia with strong PhDs and a great understanding of statistics, maths, algorithms, machine-learning theories, and what takes place “beneath the hood.” And don’t get me wrong; there are many instances where such professionals are invaluable. You do need them — but in moderation. Build your data science department with primarily these kinds of professionals, and you will accumulate technical debt faster than Usain Bolt running 100 m.

What some directors don’t always understand is that the capabilities a data science department is meant to offer are still software products. The best practices regarding software development have been known for a while. They only feel “novel” in the data science space because a lot of practitioners don’t come from such a background. A productionised model is not just the algorithm, the trained weights that would make a prediction. You need to have the proper data flows in place, with all the data filtering and engineering that needs to be done for the model’s features to be produced. The model’s outputs are another stream of data that needs to be managed properly. Even the code that trains the model itself, no matter how scientifically ingenious, should still be readable, maintainable, and follow best practices. Should the above be overlooked, the consequences will be grave. No matter how great the models will be from a scientific perspective, their deployment will be problematic, and the department will be spending its entire time fighting fires and fixing bugs, rather than building anything new.

The importance of a robust ML platform

A case has to be made here about building proper machine-learning platforms to work with. The data scientists should be enabled to work efficiently, and only good data/backend engineers can allow them to achieve as much. At Flo Health, we have invested time and resources to build a proper ML platform to host our models and ensure their robust operation. We’ve partnered with Tecton’s feature store, and so far, we have managed to onboard 1,600+ features there for training models. This reduces both the time a data scientist/ML engineer spends training a model, as well as the time required to deploy the model when it’s ready. We have built statistical feature-monitoring services to capture feature drift and be the first ones to capture problems in the data, rather than having the model produce erroneous results and finding out about it only when it’s too late and the damage is already done.

Summing up all the ideas above, the advice is the following:

Build close ties between the business/product stakeholders and the data science organisation, involving both heavily from idea generation all the way to execution.
Connect the ML models to the right business metrics and monitor not just their predictivity on some target label, but also their actual impact on the use case they are meant to support.
Support your data scientists with the right engineers to empower them and ensure that your data products will be robust, scalable, and stand the test of time.
Build a platform to collectively support your modelling capabilities rather than keeping them as independent services, optimising your processes and reducing time and costs.

Finally, the last piece of advice I can offer: Rome wasn’t built in a day. Don’t start by trying to build a spaceship. Try to build some of its components. Even crude versions might offer you a lot more value than you think, as long as they target real business needs and their effect is measured properly. Once you have such basics in place, you can iterate and grow your capabilities, adding more and more sophistication.

As mentioned before, these are some of the main principles that we follow at Flo Health, and they have set us on the right path. I hope the above will feel useful and meaningful to other fellow practitioners and stakeholders. You can always reach out to me if you want to chat about any of the ideas and concepts I’m discussing above.

P.S. Flo Health is hiring! Check out our careers page.

Written by Demetris Papadopoulos