How to Frame an ML problem? (1/2)

11 min readJul 14, 2022

The article is my attempt to capture Section 1- “Frame ML problems” of the Professional Machine Learning Engineer (PMLE) certification exam. Even though the topics are aligned to PMLE, readers who are interested to know about defining business challenges as an ML use case, establishing success criteria, and mitigating risks of ML solutions will also find it useful.

PMLE exam assesses your ability with the above

Topics covered

Translating business challenges into ML use cases.
Defining ML problems.
Defining business success criteria
Identifying risks to the feasibility of ML solutions [Next article]

Important

1.1.a Choosing the best solution based on the business requirements
1.4.d Aligning with Responsible AI practices

Note:

The topics are based on the official Google exam guide.
Most of the content in this article is collected from the resources mentioned in the credits.
Do let me know your feedback in the comments so that I can keep this article relevant and accurate.

1.1 Translating business challenges into ML use cases

a. Choosing the best solution based on the business requirements

i. ML vs Non-ML
According to a few, machine learning is the silver bullet that can solve any problem but in reality, ML is a specialized tool suitable for particular problems. Hence it is important to test the need for an ML solution and it can be done by having a baseline for comparison. In most cases, create an optimized non-ML baseline solution (if not then try solving the problem manually using a heuristic).

Once you have it in place now is the time to ponder on the question — “Is machine learning required to effectively solve this problem?”. A few parameters that you can use to determine the need can be:

Improvement: Will the new machine learning solution deliver a significant improvement over the current solution?
Value add: Analyze the total cost (development, maintenance, talent, etc) for the value of the ML solution, can that be justified?
Data availability: Is enough data available to train your model? Can you afford to create / label / impute missing data?

If answers to most of the above questions were no, then it is better to not go with an ML approach right away.

Check Your Understanding — developers.google.com

💡 Right answer, Use ML.

ii. Custom vs Out-of-the-box solutions
Only if the answer for the previous answer is yes then proceed with this step. ML solutions can be broadly divided into — Custom & pre-packaged.

A Custom solution is building a model from scratch using frameworks like Sklearn, Pytorch, Tensorflow, etc.
Pre-packaged/out-of-the-box solutions would include options like → AutoML, Machine learning APIs (Vision, Speech, Translation, Video, NLP, etc), and Big Query Machine learning.

Whether you need a custom or a pre-packaged solution depends on your project requirements and available resources. Eg:

If you need full customization of your model selection, training, hyperparameter tuning, deployment, and optimization and have the necessary expertise to do so then using a Custom solution will be ideal.
Whereas a pre-packaged solution helps to reduce time to market (less development & maintenance time), you do not need much ML expertise to utilize these services. However, they may be having limited customization options. Another factor to consider is the cost, these can be more expensive than custom solutions.

b. Defining how the model output should be used to solve the business problem.

There’s no value in predicting something if you can’t turn the prediction into an action that helps users. That is, your product should take action from the model’s output. Begin by stating your objective in non-ML terms here the goal is the answer to the question, “What am I trying to accomplish?”.

Once you have an answer to the business objective align it with the model output. The following table clearly states the goals for hypothetical apps:

c. Deciding how incorrect results should be handled

What will be the impact of misclassification in your project?
To gauge the impact of misclassification, identify how your predictions are being used. It can be determined based on the domain, use case and the end-users. Other factors include the number and type of dependents (applications and stakeholders). Based on those adjust the training objective, metrics, and acceptable thresholds.

How can you mitigate the consequences of incorrect prediction?
Unlike traditional programming, in order to avoid poor prediction, you need to take care of a broader range of parameters. A few reasons for suboptimal performance could be — lack of feature predictive power, non-optimal hyperparameters, data having errors/anomalies or feature engineering code containing bugs.

So during such mishaps reducing the impact should be the objective, it could be either by fixing the issue or by temporarily rolling back to a previous version where everything worked fine or in some cases by stopping the service. Such events can be avoided/reduced by having proper tests for — input data, feature engineering, quality of new model versions, serving infrastructure, and integration between pipeline components.

Rule #26: Look for patterns in the measured errors, and create new features.
Once you have examples that the model got wrong, look for trends that are outside your current feature set. For instance, if the system seems to be demoting longer posts, then add post length. Don’t be too specific about the features you add. If you are going to add post length, don’t try to guess what long means, just add a dozen features and the let model figure out what to do with them (see Rule #21 ). That is the easiest way to get what you want.

“…most of the times when I tried to manually debug interesting-looking errors they could be traced back to issues with the training data.”
— Software Engineer, Google Translate

How do you explain the wrong results to stakeholders?
Rule #27: Try to quantify observed undesirable behaviour. If your issues are measurable, then you can start using them as features, objectives, or metrics. The general rule is “measure first, optimize second”. Based on the investigation present the reason(s) for the error (data, model, transformation, infra, etc) as per the technical literacy of the audience and what can be done to fix the issue. Additionally, create and share an action plan that will make sure that the same and similar incidents do not take place in the future.

d. Identifying data sources (available vs. ideal).

There is no machine learning without any data. The quality of your training data strongly affects the effectiveness of the model you create, and by extension, the quality of the predictions returned from that model.

If you do not have data for your model then you can consider the following options:

Create a simple solution without ML and start collecting data
Even if you have data but if it’s not labelled then you can either label it yourself or outsource it (eg: Vertex AI data labelling service).
Purchase labelled data from 3rd party data aggregators (be involved in the process to avoid biases in labelling).

No matter what method you choose, the data must have the following characteristics.

Abundant. The more relevant and useful examples in your dataset, the better your model will be.
Available. Make sure all inputs are available at prediction time in the correct format. If it will be difficult to obtain certain feature values at prediction time, omit those features from your datasets.
Correct. In large datasets, some labels will inevitably have incorrect values, but if more than a small percentage of labels are incorrect, the model will produce poor predictions.
Consistent. Having data that’s consistently and reliably collected will produce a better model. For example, an ML-based weather model will benefit from data gathered over many years from the same reliable instruments.
Representative. The datasets should be as representative of the real world as possible. In other words, the datasets should accurately reflect the events, user behaviours, and/or the phenomena of the real world being modelled. Training on unrepresentative datasets can cause poor performance when the model is asked to make real-world predictions.
Trusted. Understand where your data will come from. Will the data be from trusted sources you control, like logs from your product, or will it be from sources you don’t have much insight into, like the output from another ML system?

1.2 Defining ML problems

a. Problem type

The most common problems are classification, regression, and clustering. Apart from these main categories, also make sure you understand the point of the following tasks/approaches/ideas.

Churn Prediction is a strategy that factors in customer data to identify clients who are least likely to renew their subscription/contract.
Applications: Revenue forecasting, Building customer retention strategies
Algorithms: Binary classification — logistic regression, decision trees, random forest, and others.

Object Detection: Localizing objects present in an image and additionally finding the label(class) of the object.
Applications: Vehicle number plate detection,
Algorithms/Models: YOLO, SSD, Faster R-CNN, …
Object Tracking is the task of automatically identifying objects in a video(frames of images) and interpreting them as a set of trajectories.
Applications: Autonomous driving, sports player tracker …
Algorithms/Models: Simple Online And Realtime Tracking (SORT), DeepSORT, MDNet …
Semantic Segmentation: It can be considered as an extension of object detection here, the task is to create similar segment masks for the same object. In short, its pixel-level prediction
Applications: Tumor segmentation, self-driving cars, …
Algorithms/Models: U-Net, Mask R-CNN, …

Sentiment analysis: In this task, the polarity of a given text or sentence is identified. In the case of text, it can be categorized as “positive”, “negative”, or “neutral”.
Applications: Brand sentiment monitoring, stock price prediction based on market sentiment, …
Algorithms: Classical machine learning classifiers, deep learning (Roberta, T5)
Intent detection: It is vital in any task-oriented conversational system. In this, the system tries to understand the user’s current goal and buckets it to predefined classes, that is, intents. eg: User query might be “Can you get me a table?” it should be classified as “Reservation”.
Applications: Conversational bots
Named Entity Recognition: It is the process of identifying entities like “Person”, “Location”, “Organization”, “Dates”, etc in a sentence. Alternatively, it can be defined as the process of classifying tokens where each token is a set of predefined categories.
Applications: Resume summarizers, customer support bots, …

Speech Recognition: It is the task of identifying speech in audio and converting it to text aka Speech to Text.
Applications: Voice-based search, Personal AI assistant, …
Algorithms/Models: Tacotron2, FastSpeech, DeepSpeech2, Transformer based Attention Models, ...

Recommender Systems: These systems are used to create personalized experiences for the users by suggesting options like —” Recommended for you”, “Recently viewed”, “Others you may like” and “Frequently bought together”(shopping cart expansion)
Applications: E-commerce product recommendation, media platforms,…
Algorithms: Content-based, Collaborative and hybrid algorithms, …

Time Series forecasting: In this task, the objective is to forecast the outcome for the next X amount of time, the input is time-dependent data.
Applications: Stock price prediction, Outage forecasting, …
Algorithms/Model: ARIMA, LSTM, LSTM-based models, …

Your phone personalizes the model locally, based on your usage (A). Many users’ updates are aggregated (B) to form a consensus change © to the shared model, after which the procedure is repeated. Google AI blog — federated learning

Federated Learning: Federated learning is not a machine learning task rather it is a technique to train models with the focus to preserve user data by keeping them on the local device. In this method, the base model is downloaded to the user device and training happens on the device, summarizes as a small update and only this update is sent to the cloud where it is averaged with other updates to improve the shared model. And this process repeats.
Transfer Learning: Transfer Learning is a research problem in Machine Learning that focuses on storing knowledge gained while solving one problem and applying it to a different but related problem.

Knowing what these means you are already familiar with many AI use cases that you’ll encounter

b. Outcome of model predictions

Define on a model level, what should be the expected outcome. Whether it should be the predicted intent of a user’s message, or perhaps the next song in their playlist? Should it predict the entire shopping basket or just the next item?

c. Input (features) and predicted output format

Defining what kind of features goes into the model, is not a trivial task as there can be some features that are combined or transformed — e.g through word embeddings and sparse vectors.

The output of a model might also need a little bit of transformation, e.g. by transforming a one-hot-encoding back into a useful label.

1.3 Defining business success criteria.

a. Alignment of ML success metrics to the business problem

No matter how good the model evaluation metrics are, if it is not improving the business metrics then the effort is wasted.

Eg: If the business goal is to detect a deadly disease almost everytime. Usually, in such scenarios, there should be no False Negatives and based on the goal we can afford to take the tradeoff of occasional False positives. Therefore we must use a model that gives high recall and not metrics like accuracy.

b. Key results

Key results are the benchmark towards the objective. In other words, they are the “how,” the steps you need to take to meet your goal. For example:
Objective: Increase ad revenue of weather app
#1: Improve rain predictor accuracy by X% to increase user session time.
#2: Recommend better ads to the user that increases the ads' click-through rate by Y%.

c. Determining when a model is deemed unsuccessful

Define the metrics you’ll use to determine whether or not the ML implementation is successful. Success metrics define what you care about, like engagement or helping users take appropriate action, such as watching videos that they’ll find useful. Success metrics differ from the model’s evaluation metrics, like accuracy, precision, recall, or AUC.

When analyzing failure metrics, try to determine why the system failed. For example, the model might be predicting which videos users will click, but the model might start recommending clickbait titles that cause user engagement to drop off. In the weather app example, the model might accurately predict when it will rain but for too large of a geographic region.

Make sure to work on the below exercises to solidify your understanding:

Try It Yourself: #1 Framing | Introduction to Machine Learning Problem Framing | Google Developers

This section consists of six exercises to complete at the beginning of an ML project to start framing your problem and…

developers.google.com

Try It Yourself: #2 Formulating | Introduction to Machine Learning Problem Framing | Google Developers

This section is a continuation of the previous exercise. Please continue to work on your problem or build a…