The Pragmatist’s Guide to Applying Machine Learning in the Enterprise

Part 1: How to identify and scope a good machine learning problem

integrate.ai
8 min readJan 30, 2018

This blog is the first in a series to help enterprise leaders capture the promise of ML. Pragmatically. To drive revenue and business results.

Machine learning (ML) is rapidly changing enterprise technology. While the hype around artificial intelligence can be daunting for those without a PhD in ML (i.e., just about everyone), successful enterprise adoption requires that teams communicate to align technical capabilities with real world business problems. This business skill set, moreover, overlaps with digital innovation practices that have been evolving over the last 10–15 years. After all, ML has a lot in common with advanced analytics.

Bottom line? You don’t need an engineering degree to play a pivotal role in shaping the future of applied ML. What you do need is commitment to the first principles of good problem solving, a slightly evolved playbook based on what is new about ML and a willingness to learn as you go. And while these blog posts don’t promise to solve applied ML, we hope they’ll provide some helpful frameworks to kickstart your journey.

Core Steps to Solve a Business Problem with ML

Let’s start with a bird’s eye view of what it takes to apply ML in enterprise settings. If you are working with a vendor you will not do all of these steps in house but the overarching principles still hold.

1 — Pick a big ticket business problem that ML is well suited to solve. Hint: look for prediction or classification problems and check out Kathryn Hume’s HBR article on spotting ML opportunities.

2 — Understand, scope and define the opportunity. Solving an enterprise problem with ML has two phases: the model and the implementation

a) The math (How do we build a model that predicts x)

b) The implementation (How will we integrate the predictions into the business so that the model’s output impacts business operations)

3 — Think about your key questions and hypotheses to guide steps 4 and 5.

4 — Acquire and prepare the data (Warning! This may be the most important step in the process and can be incredibly time consuming.)

5 — Do some initial data discovery to ensure that there is signal in the data (i.e., enough variation to identify meaningful differences between things, or trends that might support the hypothesis you are testing), to refine the problem scope and to select the best ML algorithm (simpler is almost always better!)

6 — Build and productionize the model

7 — Integrate the model with the business (i.e., make sure that your model’s insights drive to action) in a way that ensures your model is getting automatic feedback in as close to real time as possible

8 — Start with a proof of concept/experiment to test your solution

9 — Scale up implementation from proof of concept to full scale solution (or shut it down)

10 — Maintain the model and update it as the data changes

Let’s unpack each of these various steps, with this post starting with step one.

Step 1: Picking a problem

Contrary to popular belief, ML is not magic and is not well suited to all problems. Here are the top things to look for to ensure your foray into ML is successful.

Business case

Implementing new technologies is not cheap, so pick a problem worth solving. Given how enterprises measure success, strong ROI is a plus, as well as the organizational commitment to ensure people care about the outcome and have stake in success. This does not mean boiling the ocean from day one; You can and should scale down the problem to start with an experiment, just not so much that people stop caring. If people don’t care about the project and its impact, it will not get the attention it needs, the risk of failure drastically increases and your leaders become disillusioned with the new technology (“We scoped out this tiny, manageable problem and ML couldn’t even handle it!”).

Gut check: Could I put together a business case for this that my CFO would say yes to? Will my leadership team care about the performance of this project? (the answer should be yes to both)

Measurability

There should be a clear, measurable goal that can be quantified by a dominant metric. ML algorithms require a quantifiable outcome to optimize against: If you’re trying to predict if someone will buy one of your products, it’s best to start with a data that directly links predictive data (like clickstream behavior) to outcomes data (like sales transactions). There’s always the risk of falling prey to proxies, as Jeff Bezos outlines in the “Resist Proxies” section of his 2016 letter to shareholders. While you will sometimes need to default to a proxy, this should be your plan F — not A! Also, if two people can’t agree on what the outcome is, there’s no way a machine can encode it as a function!

Gut check: Can I define the objective of the algorithm with a quantifiable metric? (The answer should be yes and should ideally NOT be a proxy for something else)

Complexity

There’s temptation to believe that all computing will be machine learning in the future. But there are many applications and processed that are best captured by rules. ML is impactful when the complexity of a task is too much to be encoded by a set of hand-coded rules. Consider language translation. Google Translate is now run by ML because language translation is too hard to distill into a finite set of rules that can be programmed into a computer. One word can mean very different things based on the context in the rest of the sentence — how do you create a set of rules for that?!

Gut check: Could I solve this problem by creating a set of rules (if…then…) that a computer automatically executes? (if the answer is yes, you probably don’t need ML and can start with a simple rules-based engine instead)

Data with good signal

Our ML team reminds me constantly that a data scientist without data is like a chef without ingredients. The more historical data you have, the more the algorithm has to learn from on day one. For example, at integrate.ai, we are helping one of our clients better leverage data from a cross-industry partnership to convert leads to customers. Our platform recommends which leads they should focus their sales spend on to deliver the best ROI and helps sales agents form stronger connections with leads. Our algorithms learn from past conversion data: What people were like (behaviourally, psychographically, etc.) and what actions the business took to convert them. Often partnering with a third-party provider provides key insights to make ML practical, even for large enterprises with vast historical data!

Gut check: Do we (or does someone) collect the kind of information that we need in order to answer this question and/or can we partner to get the kind of data that we need? (the answer should be yes)

The impact of an error

ML looks at the world probabilistically (“we are 80% sure that this person is going to love product x”), not deterministically (“this customer will love product x”). This can be uncomfortable for people who are used to living in a rules-based world. Accordingly, it is best to start with problems and opportunities for which people will be more comfortable accepting probabilistic insights because the impact of an error is low (product recommendation) than ones where the impact of error is significantly higher (cancer detection). I am not saying that the latter is not a candidate for ML (there are projects in progress that aim to use ML to better detect cancer, like this one here) but when you are introducing new technologies into your organization, it is easier to get people comfortable with opportunities where the impact of an error is lower.

Gut check: If the model makes an incorrect prediction, what is the worst thing that can happen? (The answer shouldn’t be that the impact could be far worse than what you are doing today)

Feedback loop

ML is powerful because it learns continuously. To do so, outcome data needs to be continuously fed back to the model. ML learns the same way a child does: if I touch something hot and it hurts, I learn to stop touching hot things (hopefully…).The key step is feedback: the child feels pain as correlated to the hot object. And the more immediate, continuous, and frequent the feedback, the better. For example, we are helping a few of our clients predict which customers are most likely to convert to a new product so they can focus efforts on customers that need an extra nudge. We calculate a likelihood-to-convert score for every incoming customer and get feedback on whether those customers converted. By using that feedback, our algorithm’s predictions improve over time (e.g., I thought that this customer was highly likely to convert but they didn’t so next time that I see a customer who looks like this, I will give them a lower prediction score).

Gut check: Do I reliably collect data on the outcome that I am looking to drive? If so, how timely am I in doing so? (The answer should be yes! And, I get feedback in real time or at least monthly)

So there you have it, some factors to look for when you are trying to spot a great opportunity for ML to tackle. Please note the multidisciplinary nature of all the steps above. We need all kinds of people with all kinds of perspectives, talents and skills to bring to bear the potential of ML — not just people who we have traditionally labeled “technologists”.

If you found this helpful (or not) please let us know! The next post in the series will be coming in the next few weeks focused on step two: Understand the problem deeply and scope thoughtfully.

Megan is a business development director at integrate.ai where she leads customer accounts to ensure that the IAI platform is driving impact on her clients’ most important business outcomes. Megan was formerly a consultant at McKinsey & Company where she specialized in large scale digital transformations across Canada, the UK and Australia. She is also passionate about inclusion both inside and outside of integrate.ai and co-founded #GoSponsorHer last year. In her spare time she can be found singing, gyming, adventuring outdoors, hanging with family and friends; experimenting with a new health hack and learning (about anything and everything!)

--

--

integrate.ai

We're creating easy ways for developers and data teams to build distributed private networks to harness collective intelligence without moving data.