Is Machine Learning the right solution for your project?

Mukesh Jain
Machine Learning for Everyone
7 min readJan 16, 2020

In the first post, we covered what is Machine Learning (ML) and in the second we described the components of a typical machine learning project. In this post, we will cover when is machine learning the right tool for your project.

ML is a tool appropriate for only some situations. Just like you are not going use a power drill for every task, you want to be thoughtful about where to apply machine learning. There may be cases where you start with the idea of applying ML for your project and as you dig more, you realize the simple rules or heuristics can actually solve the problem a bit better. Before you invest a lot of time in ML, you want to find out if that is the right way and save your effort.

Here are a few filters and criteria to consider, to decide if Machine Learning is the way to go.

Well defined inputs, outputs and model measurements

The holy trinity of inputs, outputs and model measurements are very important factors.

  1. Inputs — You need to have the set of things you can feed into models. If they are ambiguous, change over time, and if their definitions are constantly changing, it will make it difficult to use machine learning. So, you want to be clear about what input data you have available, what do they mean and make sure that they are consistent.
  2. Outputs — Same thing as inputs also applies to the outputs. You need to have a clear understanding of the outcome you are looking for. Sometimes, business team may give you one goal but as you start digging, you may find out they had a different outcome in mind, all along.
  3. Measurements — Once you build a model, you need to decide if it is good enough for the results you are looking for or in case you already had a model, if the results of this new model are better than the current model. Understanding how will you measure the impact of your model in advance is really important otherwise you may spend six months building a model and then get stuck because you can’t figure out whether it is really working or not.

Makes economic sense (scale, speed) with a clear ROI

  1. Scale and Speed — If you have small amount of data and if you can pay someone to label all the data and the amount of data doesn’t rapidly increase, then, you don’t really need a machine learning approach to do the work, if humans can do the work equally well. You need the scale and speed to necessitate a machine learning project. For example, if an e-commerce company is trying to find duplicates in a catalog of about thousand items, a human can do the job much faster and you don’t need a machine learning model to find the duplicates. On the other hand, if an e-commerce company sells hundreds of thousands of products, you will need a machine learning model to find out all the duplicates efficiently and in a timely manner.
  2. ROI — If you need months to build a model with expensive machines and all the people, such as, data scientists, engineers, project managers and whole host of other people needed to do it, you need to see what your return from the project is. You need to think through the cost of building a model and the maintenance cost compared with the ongoing benefit of the model. Often times, the ROI is clear, but, it is something you need to carefully think about before embarking on a machine learning project.

Simple solutions don’t work

Look for the simple answers and heuristics first before going for ML. For example, let’s say, you are trying to predict the demand for certain products, but, if those products only sell a few items every year, do you really you need a ML model to predict the demand for the future years?

There is a lot of high quality data available

ML methods are learning methods so the quality of the data and the availability of the data is critical to effectively use machine learning. People underestimate how important the data is. A bulk of the time with a machine learning project will be spent on finding what data is available, how far back in the past you can go, is it high quality, are there any missing values etc. A massive amount of time goes into preparing the data for training and evaluating the model. Also, different ML algorithms require different amounts of data. Deep learning for example requires massive amount of data compared to other methods to have high quality results. If you have less data, you need to pick a method which requires less data.

Let’s say, we have the right problem and the right data to use ML. Now, what? To be successful with ML, you need three components:

  1. The right people to design ML
  2. The right process to implement ML, and
  3. The right system to deploy and maintain ML

People — need the right people

You need a lot of different skillset that you bring to bear to implement ML into practice. The following image shows the different skill sets you need, though, you don’t need all of them for every ML project.

  1. ML Scientist, Applied Scientist, Research Scientist — On the left-hand side, you have the science group. These are the scientists who will look at the date once that is available, understand that data, apply intuition and use the appropriate ML method to that problem.
  2. Data Scientist, Data Engineer, BI Engineer — In the middle you have the people who have a lot of expertise in understanding the data, knowing where they are logged, and, what each attribute means. They are the key to get the initial dataset but will also ensure ongoing process to continue to get the data to retrain the model and continue to apply it.
  3. Software Developers/Dev Managers/Program Managers — On the far right side, you have the traditional software developers. You need software developers to put the model into production, program/product managers to frame the problem and managers to manage the overall team and provide guidance.

Processes

Second thing you need is an effective process. You need a process to gather the data, train the model, evaluate it and then put it into production.

The way to think about the lifecycle of doing a ML project is first you need to formulate the problem. Once you decide what problem you are solving, the inputs to the model, the outputs that you want from the model, you move on to process of actually collecting the data. Once you have the data available, you perform some feature engineering, which is to transform the data into appropriate format that would work with the model. In the end, you will start training the model, start testing and iterating with it.

Move to Production

Finally, now that you have the model, and you have tested it, signed off on it, you want to take that model to production. Machine learning models don’t stand alone, they need to be integrated into an existing system. For example, if you are forecasting the sale, you need to integrate that into the buying system or recommendations systems need to be integrated with different recommendations widgets.

You need to think about what system the models need to be integrated with, what format they need the model to be in and how the system is going to access the predictions such as a service vs. library. You need to think about the data storage — where and how do you store the data? How much can you store? You need to think about the security and privacy of the data — how are you keeping the data safe? What encryption mechanisms are you using? You need to ensure that handling of training and evaluation data is in accordance with data classification. Last but not the least, you need to monitor and maintain the model. You need to continue to ensure that model is accurate, and it is still doing what it was designed to do when you started. It is important to monitor quality metrics and business impact with dashboards, alarms, and user feedback. The performance deterioration may require new tuning. Changing goals may require new metrics and a changing domain may require changes to the data.

This post gave you the criteria and filters to use as you decide to go with Machine Learning. In future posts, we will delve deeper into many of these factors to help you decide which machine learning approach to use that helps you achieve your goals, and the tactics to get started.

--

--