How to Start Using Machine Learning: 3 Steps

You have probably seen plenty of articles go on about the wonders of machine learning. The process of how to begin adopting the technology, however, remains terribly opaque. With this post, we aim to clear up some of that.
Before we dive in: what is machine learning?
At a minimum, machine learning is a set of techniques that enable machines to learn patterns in data without being specifically programmed. In many cases, the trained machine learning models are used to make decisions, for example, what products to recommend to you. In other cases, they merely serve to inform an analyst or decision maker about a problem, revealing patterns that might have otherwise not been obvious.
Machine learning, or simply ML, is often used as an umbrella term for a vast number of related methods, each with their own strengths and weaknesses. ML researchers develop new methods and algorithms, while ML engineers build and deploy production systems. You’ve probably heard the term used in data science circles as well; the data science community has begun actively using ML methods.
You’ll need to learn how to start using the technology. The process can be boiled down to three steps.
Step 1: Frame the problem

There’s good news and bad news: the bad news is that this is by far the most difficult and important part. The good news is that once you’ve gotten through it, you’re more than halfway to a solution.
As with any problem-solving exercise, it all starts with the right questions. Framing the problem means asking specific questions with definite answers. Try these to start:
- What is the problem we’re facing? Examples: “We don’t have a way to recommend new products to customers”, “Support tickets all get jumbled into one mess, making it difficult and time-consuming to sort through them”, “We don’t have a way to forecast usage of X resource”.
- What do we need to accomplish? From the examples above*: “We need a system that can recommend new products based on user purchasing behavior”, “We need a system that can automatically organize incoming support tickets and route them to the right people”, “We a system that can predict usage of X resource based on past usage patterns”.
- Do we have the data to support this? This may well be the most important question. If you don’t have data, you can’t do machine learning**. For emphasis: if you don’t have data, you can’t do machine learning. Make sure you have access to data from your problem domain. This could be user behavior from an e-commerce site, support tickets from a CRM, or historical usage data of servers or some other resource.
- How will we know we’ve succeeded? Put another way: What signal are we looking for, and what metrics will be put in place to measure that signal? From above: “The recommendation project will be considered successful when we see a single-digit percentage increase in product sales”, “The project will be considered successful when reps spend less time sifting through support tickets and tickets get dealt with faster”. You don’t always need to use specific numbers, but at least have a good idea of what positive signal you’re looking for.
- What is this worth to us? Set a budget! This is crucial if you’re going to engage vendors. Not only does the budget serve as a proxy for how high of a priority it is, but it will help you reason about the “buy versus build” question and the timeline.
- When do we need to accomplish this? Where in any project setting the timeline and budget are important, in machine learning projects it is far more important because ML is a new and complex technology. It’s not quite a no-brainer like a CRM or a database. The economics and semantics of ML projects are more complex.
*Notice that what you need to accomplish says nothing about the specific algorithms or tools that would get the job done. Be specific, but don’t worry about the implementation details just yet.
**At the very least, without good data it will be exceedingly difficult to get started. If you‘re not sure if you have the right data or not, then that’s a good sign it’s time to speak with ML experts. Many organizations large and small get stuck at this step.
Once you have clearly defined answers to the questions above, you’re ready to start exploring options.
Step 2: Find or build the solution

While things may not always present themselves as neatly as the picture above, if you’ve made it through step one then you’re already on your way to achieving something.
Now that you’ve defined what the problem is, given a high-level description of how it might go away, and you’ve got your data ready, you’re ready to get after a solution. This step involves — you guessed it — asking a set of questions:
- Who on our team has the expertise to build this? If you have someone on your team who has ML chops, then you’re in luck (and may not need to be reading this). If not, continue.
- If the answer is “no one”, which vendors do this? Do some quick Googling and/or asking around and you’ll get a sense of who’s doing what. There are countless APIs, platforms, and consulting companies to solve ML problems. If the problem is simple, you can probably use an API or two. Otherwise you may need to engage more deeply.
If you go the “build” route, there are many considerations: do we have the infrastructure to support this? If not, who will build and manage it? How will we scale? Building scalable, production-ready machine learning systems is a considerable effort unto itself. Before you commit to building your own, make sure you fully understand the scope of what you’re getting into, and recruit people on your team to manage and maintain those systems.
If you go the “buy” route, you can use the information from above to help determine which vendors to engage with. Note that your framing of the problem is unlikely to be 100% correct. You’ll likely learn new shortcuts and simplifications during discussion, which in turn better prepares you for tackling the next problem that needs ML.
Step 3: Implement and measure

Whether you’ve built your own system or found an API that does the trick, Implementation Day is the always the fun part. It’s when you (hopefully) get to finally see that pesky problem go away.
During the implementation process, ask yourself the following questions:
- Is this doing what we expected? Here you can reference your notes from step one, and compare outcomes with objectives.
- Are we hitting our numbers? Don’t expect results right away, but do ensure you get the desired outcome within a defined time frame. Often results don’t show up for a few weeks up to a few months, depending on the domain and the difficulty of the problem. Part of setting good metrics is ensuring you’ll see the positive signal when it emerges.
As you start test-driving the new system or algorithm, make sure you’re consistently measuring your metrics — they’re no good if you don’t measure them! Consistency is important because missing data can make it difficult to reason about what’s working and what isn’t. In the best cases this can be automated, as with the e-commerce example; you’ll know when sales pick up. In some cases this needs to be done manually, so make sure someone on your team is in charge of tracking this.
That’s it…almost.
The process of adopting machine learning technology requires a little effort, for now. Following a structured plan with detailed objectives and metrics will not only help you get the most out the technology once it’s implemented the first time, but help you get better at spotting opportunities to use it in other areas. Once you’ve seen through one or two projects, it will become natural — you’ll know what to look for, which vendors to call, and maybe even a thing or two about which algorithms to use.
When you head into ML territory, make sure you frame the problem you’re facing, determine whether to buy or build a solution, and finally be disciplined about measuring outcomes post-implementation.
Bonus: A high volume of ML industry news goes through sites like KDNuggets and Data Science Central. If you’re new to the space, these communities can help you become more familiar with the terminology and current methods. And, just for sticking with it until the very end, here’s another extra.