How We Find Machine Learning Use Cases

Our process from helping > 50 businesses decide what to do with machine learning

Markus Schmitt
6 min readMar 15, 2018

Machine learning is an extremely versatile tool. Some applications are very public, like Spotify’s Discover Weekly, Netflix’s Movie Recommender, or Google Translate. Many more are hidden, built behind the scenes by teams that used AI to solve some unique challenge for their business.

If your business has a large amount of data and you are asking yourself, “How can I use AI to build something smart from our data?” — keep reading. We’ve helped many businesses answer this question. This is our process:

1. Set up a meeting with the right people

2. Introduce everyone to machine learning

3. Absorb everything, assume little

4. Make a list of processes ripe for machine learning

5. Check feasibility

6. Prioritize

7. Research

8. Make a decision

1. Set up a meeting with the right people

Whatever use case you discover has to be rooted in company goals and data. Neither management nor engineering can give you all the answers. You need a cross-disciplinary meeting. Have a meeting with a product visionary (CEO, VP Product) and someone who knows every dataset (CTO, Head of Data Engineering). You should plan at least half a day for this.

You need a cross-disciplinary meeting.

2. Introduce everyone to machine learning

Machine learning is a tool like any other: the more you understand it, the better you can put it to use. If people think it’s a magical black box, then they won’t be able to help you with your search.

So make it practical, leave out the math, and cover these three basics about machine learning:

  1. What is machine learning?
  2. When can you use it?
  3. What are the common misconceptions?

I wrote a post to answer these questions: The 3 Basics of Machine Learning

Once everyone has an understanding of what machine learning is, it’s time for you to learn from them:

3. Absorb everything, assume little

Every firm is unique. Even within narrow verticals, the overlap in what two different companies need is smaller than you might think. Don’t try to fit your business into a box.

Map the terrain

Map the business terrain.

Goals. What goals are driving the company right now? What are the challenges behind these goals?

History. What projects have been implemented in the past? What were the results, the challenges, and the lessons?

Data. What data exists? Where is it generated, and where is it saved? How much consistent history is there in each database? How exactly does each table look? Can the different datasets be merged on unique identifiers?

Infrastructure. What is the preferred infrastructure? Are there relevant policies or restrictions on which provider to use (on-premises, AWS, or Google Cloud)?

Data Science Strategy. What is your data science vision? Do you want to build up your own expert team, or do you want to find an experienced team to build you a solution? Or a combination of both?

After you know what’s driving your business at the moment, you can get more concrete. Now it’s time to collect all the potential use cases.

4. Make a list of processes ripe for machine learning

Where is a lot of data being used to automate decision making?

Machine learning is just a tool to automate pattern discovery and then make smart predictions based on those patterns. Most of the time it’s about improving an existing process by making it a little bit smarter. Processes that are good candidates for machine learning are usually:

Data based: Decision making in the process is already entirely based on data.
Large scale: Decision making happens over and over again, thousands or millions of times.
Automated: The process already uses software to some degree.

Already automated, large-scale, data-based decision-making processes are the perfect potential candidates for machine learning systems.

Good examples:

  • Product recommendations
  • Credit scoring
  • Personalized marketing
  • Fraud detection
  • Image recognition

So think about where a lot of data is being used to automate decision making, and whether there is room for improvement.

The next best place to use machine learning is to support a process that is currently done by people. If it’s entirely data based, highly repetitive, tedious, and therefore slow, it might be ripe for improvement. Can you make it faster by training a machine to take on some of these decisions?

5. Check feasibility

For each use case, find out whether the necessary data is being captured. Specifically, check whether the different datasets you need can be merged.

The more machine learning projects you’ve already implemented, the better you can pinpoint the right questions to ask. Build on your previous experience:

  • What are the common pitfalls in projects like this one?
  • Which datasets are the most important to have, and which ones are optional?
  • What is a reasonable level of improvement to expect in this situation?

If you haven’t implemented a similar use case before, talk to a team that has.

6. Prioritize

Prune ideas early if they aren’t helping achieve the company’s most important priorities. Refocus the discussion: “This is more of a nice-to-have, so let’s leave it for now.”

Ask critical questions: “If we did manage to automate and improve the accuracy or speed of this process by 20%, what would that mean in revenue per year?”

To compare the cases that are left, make an Excel sheet with the following columns:

Data availability — How easy is it to access the right data for this application? If you don’t have the data yet, give it a very low rating.

Potential gain —If things go very well, how big is the potential impact on a critical business priority?

Risk — Are there many unknown factors that could derail the project? What does your experience tell you?

Time to implement —Prioritise quick wins. With one solid success behind you, you can move on to the more complex projects.

7. Research

Once you’ve identified your top 1–3 cases, do a broad search through Google:

  • Who has implemented similar systems before?
  • What approaches did they try? Which ones did they settle on and why?
  • What were the learnings and the final results?

In machine learning, you can patch together some of that information from published academic research. Don’t copy their approach, though. It’s likely that it was just good for their specific dataset. Get inspired, and steal the best ideas. Use them to guide your further investigations.

Another good source is Kaggle competitions. If you find a competition for a similar use case, look at the kernels and the forum discussions. They will give very specific input on how to implement and tune an algorithm, and in most cases you can even find full sample code.

8. Make a decision

Update your use case ranking with the additional information you learned. Then, based on your research and experience, make a rough project plan for each idea.

Together with your prioritization table and the project plans, present your findings to your team.

If you’ve done this, always keeping the bigger goals in mind, it should now be easy for your team to decide on the best project.

Time to get your hands dirty and implement.

Have fun!

Want to learn more? We are, Data Revenue, a team of machine learning engineers from Berlin, Germany. We build custom machine learning systems for some of the largest web, tv, and bioinformatics companies.

Ask me anything: m.schmitt [at] datarevenue.de

Further Reading

--

--

Markus Schmitt

Founder at Data Revenue – We speed up Biologists with custom built ML Software | www.datarevenue.com