Do you want ML with that? Why to say yes, and when to say no

Published in

Humans of Xero

6 min readJan 13, 2022

With international travel basically off the table for the past two years, many folk where I live in Australia have been turning their minds to travel closer to home. Sales of caravans and 4WD vehicles are off the charts. Buying a 4WD comes with a few compromises though — they’re expensive to buy, costly to run and not very environmentally friendly. So why choose a 4WD? Because you need to drive off road.

Want to drive on wet sand? Across a cattle paddock? Ford a river? Head into the outback? With that intent, the extra expense, the engineering quality, the all terrain tyres, bull bars, and yes, maybe even the winch on the front make sense.

Machine learning (ML) systems have some similarities to offroad vehicles. Expensive to build, expensive to maintain, but there are some business problems where using ML offers solutions that are otherwise impossible. And sometimes it makes commercial sense to solve these problems even with the extra expense and complexity. So how do you figure out when to say yes to machine learning for your product idea?

Onroad and offroad — total cost of ownership

Every software engineer learns early to strive for loose coupling within their solution architecture. Features that depend on machine learning pipelines can play havoc with this approach, tying together UX designs, data transport layers, real time operational systems with batch orientated analytical processing.

It’s also early days for truly ubiquitous machine learning at scale. The base technology set is rapidly evolving, vendors abound and there are fewer established patterns for young teams to follow. Storage and processing are cheap but they’re not free, and products with embedded machine learning pipelines are data hungry like you wouldn’t believe.

Couple this to a white hot talent market for new and hence still rare skills and a rapidly rising bar on explainability and acceptable harm, and it is easy to underestimate the total cost of ownership by an order of magnitude or two.

Responsible driving (data use)

There is an intense and increasing scrutiny of whether products and services built with the data dust generated by our daily travels through the digital world are being built in ways that are responsible and lead to fair outcomes.

“If you think technology will solve your problems, you don’t understand technology — and you don’t understand your problems.” The meditation teacher of Laurie Anderson, machine learning artist in residence at the Australian Institute for Machine Learning.

These considerations and challenges are not confined to ML inside products but just as complexity increases the total cost of ownership, the complexity of data hungry ML algorithms increases the possibility for inadvertent harm.

Privacy is often in tug-of-war with personalisation
Risk of codifying the morals and prejudices of the data collection period

Don’t use a sledgehammer to crack a nut

And finally, ML is just not advantageous as often as you think it is. ML is still new and exciting to many dev teams, so there’s a tendency to dive right in and assume that learning patterns from data with ML algorithms will produce a better product and a better end user outcome. But years of experience have put me with Google on this one — if you can possibly avoid using machine learning based data processing pipelines you should. Try lookup tables, try commercial APIs provided by bigger and more established players. And most definitely try using humans to bootstrap your ideas with a pilot user base.

If your users can get where they need to go on a Vespa, keep the Jeep Wrangler in the garage. But if your users will need the Jeep Wrangler, then read on.

When you need the 4WD option — reducing uncertainty

If you have the good fortune to work in a data rich company with millions of users using your software daily to simplify their lives, you might still be in the enviable position of having more potential opportunities to break out your 4WD than you have the capacity to try. Happily there’s another filter to apply.

These two questions, used in parallel can help you kick the tyres of good ideas in a structured way that tests feasibility and guides development investment

“How good does it have to be to be useful?”

and

“How good can we make it with today’s data and workflows?”

Let’s start with the first question — how good does it have to be to be useful? What we want to do is reduce the uncertainty of our answer to this. So we want to set some bounds.

You can get a really good estimate of these two bounds with some whiteboard sessions, some solid UX research and maybe some human bootstrapped pilots. If you’re canny, you won’t need to write a single line of ML code.

In parallel, you need to set some bounds on do-ability “How good can we make it with today’s data and workflows?”

Things to consider here are current standards of data coherence, fidelity and completeness. Has the data you need been collected? What are the gaps? Do you have the user permission to use it for this new use case?

Would the new service need to be calculated in near real time using behavioural signals that have just been collected? Or can you satisfy your users with yesterday’s data and hence rely on batch processing?

Order of magnitude, how much data might you need to crunch? And what inference times would you need to hit to deliver a seamless experience?

Do you have closed feedback loops in place or will you need to build those (instrumenting a product is often overlooked in the adrenaline fueled rush to MLP and feature parity and it can be much more effort that it appears to retrofit comprehensive instrumentation to an established, multi channel code base.

What about labelled data availability? The large majority of commercially successful machine learning products today are built on supervised learning algorithms and training those requires lots and lots of accurately labeled past outcomes.Does your user base provide these already as they use your product today or will you need to manually label a bootstrap set?

Investing some time to bound your uncertainty in both dimensions will pay off. Sure you hope to see this picture emerging.

But what if it looks more like this?

Vehicle maintenance — all the overlooked work

Finally don’t fall prey to only considering the work it would take to train your algorithm. Today most folk will now think about training AND scoring the models. Hopefully you’ll also realise that you will need to RETRAIN the models. And redeploy them.

Add to that work to improve your closed feedback loop — both the data quality and the label accuracy. You’ll also need a holistic and reliable way to manage your audience splitting when you want to run many models or model versions in production. Unless you have a very simple product and user base, there’s a hidden world of complexity in audience splitting for a product experience that plays out over a longer time horizon than ‘which button text gives me the highest cart conversion rate’.

You may also find you have internal change management challenges if you are automating work currently undertaken by your coworkers in customer support or sales. It takes time to trust that a machine can actually do a decent job.

4WD — or not?

So we can see that machine learning systems have some similarities to offroad vehicles. Expensive to build, expensive to maintain — but for some business problems, ML offers a useful solution. The real trick is to figure out whether, or not, ML adds value.

Total cost of ownership, responsible use and maintenance are all factors — but the key questions are “How good does it have to be to be useful?” and “How good can we make it with today’s data and workflows?” Figuring out whether there is overlap, or not, between the bounds of do-ability and ‘can be useful’-ness is the first step in figuring out whether you want ML with that.