Applying Machine Learning? Here’s how to frame your problem first

Alexander Barinov
Intelliarts AI
Published in
7 min readJan 11, 2021

TL;DR The article compiles a set of recommendations for business and tech experts on a first important step required prior to starting any ML project - problem framing.

AI and its subset - Machine Learning (ML) - is a real hype of the XXI century. Sci-fi writers compose stories about robots conquering humanity; tech giants boast about imbuing their products with AI and ML; many companies strive to implement it. Such publicity created the impression that Machine Learning is some miraculous weapon for all possible problems. As you could guess, it is not. It mostly depends on the situation itself if Machine Learning will be the best solution for it or not. To pick the right problem, ask yourself the next questions:

  1. What difficulty is my product facing?
  2. Will it be a good problem for Machine Learning?

Problem statement

While asking “Does my project have any problems?”, concentrate on dilemmas that are hard to solve with conventional programming. For example, think of Google Photos, where the problem was to find a particular photo by keyword search without manual tagging. There isn’t a clear approach to deal with it with traditional programming, but machine learning can solve it by studying patterns in data and adapting to them.

In general, seeing patterns in actions repeated by humans may appear to be the first step in automating your business processes, as the human factor is identified as the cause of up to 80% of breakdowns.

ML is just one of the tools, so only bring it out when suitable. Also, consider other problems you have that are similar to the ones you are striving to fix. This may be beneficial for solving them all, as approaches and data sources might appear similar.

After the problem is chosen, think well about why it actually needs to be solved. Reflect on your motivation for solving the problem. What needs to be fulfilled for the problem to be solved? Consider the benefits of having it solved. What opportunities does it enable? It is essential to be clear on the advantages of solving the difficulty to capitalize on them. If it benefits your business, be explicit on what those benefits are. Decide also on how you will know when you received them. Why is that significant to you? Analyze how to use the solution to the problem.

Then try to phrase your problem as clearly as possible. For example, “We want the ML model to predict how long will the engine of our tool work until it requires repair”. But don’t restrict your goal to metrics which you have already been optimizing - concentrate on the bigger aim of your product. Still, keep in mind that your company’s problem has to be determined in a quantifiable way. Measurable metrics should be settled first.

After it, investigate how you could resolve the problem manually. It can spotlight those difficulties which actually can’t be solved satisfactorily using a manually implemented solution. It will bring up a lot of valuable knowledge like where the data is actually stored, what kinds of features would be helpful, and many other aspects. Accumulate all of those details as they update the previous parts of the problem description - especially the hypotheses and rules of thumb. Intend to make decisions, not just predictions - ensure your forecasts let you take beneficial action.

These preparatory actions are advantageous for problem-solving, as well as the company, as they save future costs since a lot of ML solutions are usually outsourced. There are several additional benefits that you might get when problem framing was done in cooperation with ML experts:

  • Clear problem statement that fosters alignment between different parties involved in building ML-powered solution
  • Initial vision of a problem solution and its benefits for a business
  • Understanding of the data structure and amounts required
  • Metrics that would be used to assess the outcomes

ML problem types

For most companies planning to use ML for the first time, the difficulty is also to understand whether the problem they have is, in fact, solvable in this way. To resolve a problem applying ML, it is useful to know the types of ML problems, as it helps to understand which tools do we have.

There are three main types of machine learning problems based on what the prediction tasks are like: Supervised, Unsupervised, and Reinforcement learning.

The latter is the training of ML models to make a sequence of decisions, enabling an agent to learn in an interactive environment. It uses trial and error feedback from its actions and experiences. To make the machine do what you want, the AI receives rewards or penalties for its actions. The goal is to maximize the total reward. It’s used in spheres like self-driving cars, personalized recommendations, robotics, games, etc.

Supervised ones are algorithms where we make predictions based on a set of examples. Unsupervised ones look for previously undetected patterns in a data set with no pre-existing labels and with a minimum of human supervision.

Supervised ML algorithms may be categorized into the following subcategories:

  • Classification algorithms - we use data to predict which category something falls into. For example, classifying emails as “spam” or “not spam”, analyzing medical data to decide if a client is in a high-risk group for a particular illness.
  • Regression algorithms - are cases when we try to predict an output variable which is a real or continuous value. E.g. prediction of employee salary expectation or prediction of the temperature next week based on historical data.

In unsupervised ML algorithms, we provide the ML algorithms with unlabeled data. Then we ask it to look for its hidden characteristics and group it to make sense based on the data. One of the most widely solved unsupervised learning tasks is data clustering. Representatives of clustering problems could be genomics. We supply an algorithm with thousands of different genes, which are then clustered into groups of related genes. As this method helps to detect anomalous data, it’s also used for anomaly detection.

Real cases of problem framing in ML

Our team expertise in applying ML mostly lies in the fields of anomaly detection and predictive maintenance. Anomaly detection is a method that determines the deviations of a dataset. Typically the atypical items will elucidate problems, from a structural defect to bank fraud ecosystem disturbances.

Business owners first need to decide what is normal and, consequently, abnormal behavior for their system. One of our customers was a fabric producer, whose problem was stated as determining weaving anomalies. ML can analyze the fabric’s images and evaluate defects on its surface. Locating and assessing surface defects of the fabric allowed an effective assessment of the quality of the product. Without using an automated system, they would have needed to hire a person who would manually control the quality, which would be neither cost-effective nor reliable, as a human factor is significant for such routine and monotonous work. With the use of an ML solution, they don’t need a pair of eyes checking on every centimeter of denim. Cameras are watching all the looms 24/7, and the algorithm notifies a responsible person about anomalies detected at any part of the production line, giving all the needed details. Implementation of an ML solution resulted in both quality improvement and cost decrease.

Not all issues are so simple to transform into an ML problem and solve; some may require more effort from both the business and ML experts. One of our customers has recently decided to increase their business intelligence by introducing predictive maintenance into their machines. The goal was to calculate outage probability in their industrial IoT devices. After examining the case, we understood that the problem was hardly solvable without quality historical data about past outages. We did know when the machine was off but had no data about the reasons: whether it was switched off to relocate somewhere, because of the lockdown, or because it was actually broken. The second trouble with the data was that it was too “young”, as we realized that they needed to collect data for a longer period for decent predictive maintenance and anomaly detection. So, having insufficient data to solve the initial problem, we decided to use the subset of available data to concentrate on the more feasible one. As a result, the company is now using historical data for anomaly detection that should help the customer handle suspicious and failed sessions of using their devices. A flexible approach allowed us to modify the ML problem so that the available data will be used for the customer’s benefit, while preparing an existing system for collecting more data required for predictive maintenance solution.

The application of existing data isn’t always obvious, but once ML engineers understand the available data and business domain, they could find additional solutions that your business would benefit from. In the case above, our ML team gave the customer suggestions on what needs to be done to implement predictive maintenance in foreseeable future, such as what data should be collected and for how long. We are also now working with existing data to detect anomalies, which helps the business owners indicate critical incidents, such as technical glitches or any other abnormal system behavior.

The case exemplifies the importance of considering historical data while framing your problem. It’s not always clear for a company if they have relevant quality data to solve the problem with ML, so you may think of being consulted by data science engineers first as well. They may conduct exploratory data analysis before machine learning professionals get down to business.

To wrap-up

No matter whether your business is IoT driven or you run a customer-facing website, applying ML methods might be beneficial for you. Just don’t be afraid to take the trouble to organize the cooperation of experts of different kinds, who are eager to think outside the frame while creating a unique solution for your business.

P.S. Thanks for reading! Hope you liked the article and find it helpful. Would love to see your comments or follow-up questions.

Related suggested materials:

--

--

Alexander Barinov
Intelliarts AI

R&D enthusiast in a field of Data Science and Machine Learning with vast experience in software engineering. Helps companies to gain more value from their data.