How to scope a Machine Learning project

The most important step for a successful ML project

Frankline Ononiwu
CodeX
5 min readSep 9, 2021

--

Image from bivy.com

Most machine learning projects end up not meeting business goals. Even after training and testing, many models never make it to production use. This has led companies to re-examine the purpose of Machine Learning and Data Science teams in their organizations.

The disconnect between ML projects and real-world business use case is often traced back to a lack of proper project scoping. For many practitioners, a machine learning project starts with obtaining data, processing, model training/evaluation and ends in model deployment. Too much focus on mechanics leads to missing the goal of the business side of the team.

Project scoping helps to mitigate that disconnect. Though scoping lies at the boundary of project management and machine learning, most machine learning systems in production rely less on traditional project-based delivery methods and more on product development-based ones where flexibility and adaptability are critical for meeting customer needs.

What is project scoping in Machine Learning?

Scoping is the process of planning out a project and making decisions on what resources to employ to accomplish the project. But it’s more than just planning, it involves asking the right question, finding the business objective, and aligning that with Machine Learning solutions. Scoping is part of the larger process of a machine learning project but is the first step (Fig 1) and is often regarded as the most important one.

Fig1: Machine Learning Project Life Cycle: (image from Landing.ai)

ML Project Scoping Process

Brainstorm business problems, define goals, and identify problems (What are the things you wished were working better? what do you want to achieve?) — Most projects start with a very vague and abstract goal, get a little more concrete, and keep getting refined until the goal is both concrete and achieves the aims of the organization. This step is difficult because most organizations haven’t explicitly defined analytical goals for many of the problems they’re tackling. Sometimes, these goals exist but are locked implicitly in the minds of people within the organization. Other times, there are several goals that different parts of the organization are trying to optimize. The objective here is to take the outcome we’re trying to achieve and turn it into a goal that is measurable and can be optimized.

Brainstorm AI solutions (How to solve the problem and achieve the needed outcome) — identify solutions. Start with divergent thinking and then move on to Convergent thinking. It’s important to brainstorm as many solutions as possible and progressively converge to the best solution that would have the most business impact with the least complexity. You probably want to start with the simplest solution if you want the most bags for your buck. The solutions we are advocating for have to be actionable and impactful.

Assess the feasibility and value of potential problems — Diligence on feasibility and value is a very important component for designing a successful ML project. Ensure that the project is technically feasible — use external benchmarks to gauge feasibility. If humans can give a high level of performance, that suggests that the project is feasible. Otherwise, it may be harder. HLP(Human Level Performance) is an important benchmark for assessing feasibility.

The following matrix shows a way to assess project feasibility based on the type of project and dataset for the project. For instance, as shown in the matrix, for a new project with an unstructured dataset, the only consideration for feasibility will be based on the HLP of the task.

Fig2: Matrix for Assessing Project Feasibility and Value: (Image by author)

How do you estimate the value of an ML project? there is always a gap between MLE (Machine Learning Engineer) metrics and Business metrics. They are always at the opposite ends of a spectrum with one focusing on mechanics and the other on business goals. We hope to achieve a compromise in metrics. Give thought to ethical considerations. Is the project creating a net positive societal value? Have any ethical concerns been openly debated? How have you thought through privacy, transparency, discrimination/equity, and accountability issues around this project?

POC’s(Proof of Concept) are also essential to prove that an idea is relevant and potentially valuable. POC is the minimal working state, or at least a certain working part, of a digital product. It demonstrates the feasibility of the idea. Creating a POC for a data science solution can be different from conventional software, as we must investigate larger prospects to create it. Unlike, say, a web app POC, in ML POCs you can’t just focus on one aspect and build it. The POC should already be trained well for working with unknown data. POC helps minimize risk, generate immediate insights, and get people aligned to the project objectives and direction.

Determine milestones — the key specification in determining milestones is to define the metrics and establish a timeline for achieving those metrics. This will give the desired clarity when working on a project. Establishing the following metrics can help in determining milestones for an ML project.

ml metrics eg. Accuracy, F1-Score, MAE; Software metrics eg. MTTR, Application Crash Rate(ACR); Business metrics eg. Monthly Recurring Revenue(MRR)

Budget for Resources — What data do you have and what do you need for the project? The data has to be matched to the solutions you are seeking for the problem. We also need to determine the resources for obtaining and handling the data to achieve the desired data analytics goal. What analysis needs to be done and what are the tools for such.

Based on the availability of resources and with metrics well defined, we can propose timelines for achieving those metrics. New projects with complex data structures and machine learning models may take a long time to achieve based on model training, data extraction, and preprocessing. There should also be a consideration for the computational requirements of the project.

Conclusion

To prevent misunderstandings and build the right solutions for the respective target problems, the stakeholders and ML expertise team need to reduce ambiguity. This can be achieved by scoping use case requests. The scoping phase guides the definition of project goals and vision aligned to the team’s business objective. It’s essential to have coherent communication and transparency between the business stakeholders and technical team, and detailed definitions of successful business and technological outcomes. Agreement on key performance indicators (KPIs) and quality criteria of the solution’s expectation is key to project success and performance review. We don’t want to be working on a solution that does not solve the business problem or on a viable solution that solves the problem while there is another viable solution that produces 10X the needed result.

--

--

Frankline Ononiwu
CodeX
Writer for

I am a Biomedical Scientist turned Data Scientist. I have a strong interest in MLOps and NLP. I love explaining technical terms in non-technical ways