How to approach any machine learning problem

Published in

DiveDeepAI

5 min readAug 11, 2022

Introduction

Whether you’re trying to solve a problem with machine learning or want to understand what’s going on, the first step is knowing what you’re trying to accomplish. In this post, we’ll walk through some of the steps required for setting up your project so you can start thinking about how machine learning can help.

You need a way to measure success to know what you’re trying to accomplish with machine learning when you start a project and whether your approach is working

You need a way to measure success.

When trying to solve a machine learning problem, knowing what you’re trying to accomplish with the system and whether your approach is working is essential. If not, how can you tell? How do you know if your model has learned something valuable from its training data set? And how do you know if your model is overfitting or underfitting?

Measuring these things will help you understand and set the goals for the project so that when things start going wrong (which they will), then at least there’s some way of knowing what went wrong and adjusting the course accordingly.

If your goal is to create a product that uses machine learning, you’ll need to think beyond training and testing accuracy

To create a product that uses machine learning, it’s essential to think beyond training and testing accuracy. You’ll need to measure the model’s impact on your business and customer experience.

Many other factors can influence how well your model performs — and they’re often more important than accuracy in determining whether you should use it in production.

Here are a few of those factors:

How quickly can you build systems that make sense of data?
How easily can your customers (or end users) interact with those systems?
What does this mean for your team structure? Can you add people without increasing cost or decreasing efficiency?

One of the most important lessons in machine learning you can’t learn in a textbook is: There are always tradeoffs. It’s critical to understand your business goals and prioritize accordingly.

Working with humans means that even minor technical improvements can make a big difference in human effort

It will help if you remember that humans are not machines. Even the most advanced AI systems are still a long way from being able to replicate human intelligence, and they’re not going to get there anytime soon.

As we’ve discussed before, there’s no single correct answer regarding machine learning — you can have an algorithm that works well on one data set but terribly on another. This means you’ll need human feedback for your model to learn quickly and adapt as necessary.*

However, even minor technical improvements can make a big difference in human effort (and therefore cost) when working with humans.

Your production data pipeline may differ from your training data pipeline in ways that affect users’ experience and privacy

Your production data pipeline may differ from your training data pipeline in ways that affect users’ experience and privacy.

For example, if you’re building an app for users who pay for premium services, some of the user’s payment information will likely be stored on your server without encryption. In this case, it’s essential to keep track of where users’ data is stored so that they can opt-out of having their personal information sent back through the same path as they paid for access (or worse yet — shared with third parties). You should also consider what happens when someone uses your application: Do you have any way of preventing them from leaving behind additional files or cookies? If so, how would these files/cookies affect other parts of our application?

Nobody always makes good predictions. Model uncertainty is not just an academic concern; it plays out in practice, especially as algorithms touch more people’s lives

The second thing we need to understand is that nobody makes perfect predictions. Model uncertainty is not just an academic concern; it plays out in practice, especially as algorithms touch more people’s lives.

For example, imagine you’re trying to predict which of your friends will die young based on their genes and environmental factors like smoking or dieting habits. If you have only one friend who has died young at age 30 and are predicting that all your friends will die young, you might think this model contains no error! But if you look closer at how the data was collected (e.g., did they ask everyone whether they wanted to know their mortality date?), then there’s likely some bias toward reporting accurate information about themselves (because those who don’t want anyone knowing their actual mortality date probably aren’t going around telling everyone).

Problems where machine learning can help reduce human effort and bring down the cost

Machine learning can also help reduce human effort and bring down the cost. For example, you may want to use machine learning in a business problem where you’re trying to extract data from a set of documents (e.g., “documents are handwritten and scanned”). You could also use machine learning for data problems that involve removingduplicate records in your database — say, if there were two versions of the same person with different names on each document (e.g., John Smith vs. John Mccusker).

In both cases, one important thing is identifying an objective metric that can be measured automatically and tied directly back into user experience: How long does it take me to search through my emails? Is this a helpful feature or not?

Keep in mind that there are many problems where machine learning can help — but not everything will be a good fit! The key is finding those areas where your problem matches up with one of our current capabilities.

Conclusion

When dealing with machine learning problems, you must first ask yourself the business goals. Then you must identify which kind of data set and how much time you have available to deliver results. If your goal is to create a product that uses machine learning, it may take some time before the algorithms are good enough for production use. But if your goal is just to get started quickly or save money on training costs by using off-the-shelf models instead of building them from scratch (as many companies do), then any model that works in test data will work for production data too!

How to approach any machine learning problem

Written by Umer Sufyan