Business Manager: How to Launch a Machine Learning Project?

9 steps to turn your idea into business value

Brigitte Maillère
The Startup
6 min readSep 7, 2020

--

You are an innovator.

You are curious about Machine Learning.

You have an idea that could improve your business by leveraging the information hidden in a set of data you already have or could collect.

As an expert in your industry, you are often the only person who can detect an opportunity of new income, cost reduction, faster decisions. But getting business value from data requires the combination of different skills from different people: business vision, mathematical skills, technological skills.

How to turn your idea into value?

Machine learning uses data to create self-modifying algorithms, which can “learn” to produce the desired information. The machine learning practitioner will create algorithms and “train” them to match expectations.

Your project will involve collecting and preparing data, training mathematical algorithms, developing a digital solution to present results (from simple one-shot visualization to a dynamic dashboard or an integrated software in your systems).

Along the path to a successful implementation of your idea, you will have to work with internal or external partners from various roles and skills: business expert, IT people, Machine Learning expert, customers panels …

In some cases, you will have to first sell your project to a sponsor and get a go and a budget for your project.

To succeed with all these people, you will have to find a common language.

I have listed 9 topics you should work on to explain your idea and launch your project.

Here is a template you could use to refine your project, and some advice to fill each part of the template.

1. Idea

As a business leader you spotted an opportunity to improve your business: increase income, lower costs, make better decisions.

You think that you can use available or reachable data do learn new information.

In this part, you want to explain:

  • The information you want to learn ex: customer profile, sales predictions, text category, probability of mechanical failure, …
  • The data which are available or could be collected

2. Business Value

Who will benefit from the information learned by machine learning?

  • Will your customers be offered a new service?
  • Will you reduce the cost of an internal process?

3. Resulting Solution

How will your company use the resulting machine learning algorithm?

  • Is it a one-shot study and the result will be a report? (ex: actual customer segmentation )
  • Do you need your algorithm to be used on a regular basis to produce an updated dashboard? (ex: sales predictions)
  • Do you need your algorithm to be integrated into your manufacturing process? (ex: automatic routing depending on image classification)

4. Data Sources

You might use various groups of data. For each one, list :

  • Explicit content of data
  • Source of data: where was data captured, from whom (customer purchase, mechanical sensors, twitter post, medical report…)
  • Estimated number of data items
  • Data presentation: Are the data already structured (ex: a spreadsheet of transactions) or do they need a pre-processing work (ex: extracting meaningful value from physical sensor signals, collecting text from multiple sources in different formats…)
  • Legal considerations relating to the storage and processing of data
  • Data Labellisation

For most of the projects, before being able to learn from your data, you need to know the “true answer” for a sufficient number of examples. Among a set of financial transactions, you need to know which were proven to be fraudulent and which were proven to not be fraudulent. Data tagged with the true answer are named “labeled data”.

Are your data already labeled?

If your data are not labeled, you should have them labeled by internal people or an external partner.

5. Information to be learned

In this part, you will explain how a machine learning practitioner will learn the desired information from available data.

Information to be learned

Explicit which result you are targeting:

  • You want to learn a category “ Is this mail a technical support demand or a commercial demand”, “ Is this transaction fraudulent or not ?”
  • You want to learn a value: “ What is the estimated volume of sales for this article for the next week ?”, “ What is the probability of this machine to present a major failure before the end of the year ?”
  • You want to discover patterns: “is it possible to segment our customers into 5 groups with similar habits ?”

The machine learning practitioner will turn your project into a “classification project, multi-class classification project, regression project, clustering project …..”. It’s OK for you to use these words if they make sense for you. But you should keep your project written in words comprehensible for all partners in your project.

Performance level

Explicit the level of reliability you need for your result.

Examples:

  • We should detect at least 99.9% of fraudulent transactions
  • We accept 10% of error in sales predictions for this category of products
  • The solution should classify an image in less than 0.5 second

The machine learning practitioner will turn your performance target into a mathematical definition “False-negative, F-score, Jaccard score, confidence interval,…”. Try to get from her/him a precise explanation of the measure she/he will use. This will make you sure you are on the same path.

6. Feasibility

Before starting the project, it is not possible to guarantee that the performance level for the targeted result can be obtained.

Failure to learn the desired information can arise from different causes :

The information is not contained in the data.

If you try to estimate the age of the weather scientist based on weather data you will not succeed in your project, whatever the volume of your data, because the desired information is not contained in the data.

In this case, you will have to collect other sources of data to succeed.

The information is lying in your data but you don’t have enough (labeled) data.

In this case, you will have to get a complementary volume of data to increase the performance level.

The developed algorithm and its technical implementation in the IT architecture cannot provide a response under the targeted time.

The algorithm and/or its implementation will have to be reworked in the hope of reaching the desired performance level.

At the beginning of your project, the machine learning practitioner should be able to rate the feasibility of your project based on the state of the art and the specifics of your environment. During the development of the project, she/he must keep you informed about performance results and explain the causes of low performance.

In this part of your project presentation explain the estimated feasibility of the training, based on the practitioner interview.

7. Project Stakeholders

List all teams, departments, or external partners which will be involved :

Business teams (marketing, manufacturing, HR, …)

They define the business problem or the business opportunity. They evaluate the solution performance.

Data Lab, machine learning practitioner …

The machine learning practitioner transforms data, develop and train algorithms

IT teams

For some projects, you need to involve your IT department to extract data from your systems or to deploy the machine learning solution for the end-users.

8. Steps

This part is the place for a visual display of all steps required and the people involved.

It typically includes all or part of the following items :

  1. Defining the business problem to be solved or the business opportunity
  2. Collecting and exploring already existing data, collecting external data
  3. Exploratory Data Analysis: extracting pieces of evidence that the information you want to get is contained (even if hidden) in your data
  4. Machine Learning training resulting in an algorithm which can extract the desired business information
  5. Implementation of the end-users solution: report, or dashboard, or business software
  6. Don’t forget the ongoing maintenance of your solution: The performance level could decrease and must be monitored.
  • ex: Sales prediction algorithms must be periodically retrained to keep up with new trends.
  • ex: Changes in data sources may arise ( poor quality of new data, interruption of some data source, change in input data format…)

9. Costs

Depending on the type of solution described in §3 “Resulting solution”, describe here the nature of the costs: internal people workloads, external services, and maintenance costs.

Conclusions

When you have covered these 9 topics, you will have a precise definition of your project. You will be able to demonstrate how you will extract value from your data. Most importantly, a common language will avoid you being trapped in data science mysteries and will empower you to manage your machine learning project towards your business objectives.

--

--