Classification and Regression Problems in Machine Learning

Published in

EnjoyAlgorithms

8 min readJun 6, 2021

Here, we classified the whole machine learning on five different bases. While discussing the classification of ML based on the Nature of the problem statement, we divided ML problems into three different categories, namely.

Classification Problem
Regression Problem
Clustering Problem

In this article, we will talk deeply about classification and regression problems.

Key takeaways of this article would be

In-depth explanation about Classification and Regression Problems.
Implementation of the solution to these two problems and understanding how the output will look like.
A detailed and intuitive understanding of entropy in classification problems.
A problem that can be solved both ways, either considering it a regression problem or a classification problem.

While classifying Machine Learning based on the Nature of Input data, we learned about supervised learning as:

Supervised learning is where we have input variable (X) and an output variable (Y) and we use machine learning algorithm to learn the mapping function from the input to the output variable.

Based on the Nature of output data, we further categorize supervised learning into two different classes:

Classification Problems
Regression Problems

Both problems deal with the case of learning a mapping function from the input to the output data.

Let's dive deeper into these two problems, one after the other.

Regression Problems

Formal Definition:

Regression is a type of problem that use of machine learning algorithms to learn the continuous mapping function.

Taking the example shown in the above image, suppose we want our machine learning algorithm to predict the weather temperature for today. If we solved the above problem as a regression problem, the output would be continuous. It means our ML model will give exact temperature values, e.g., 24 °C, 24.5°C, etc.

To measure the learned mapping function's performance, we measure the prediction's closeness with the accurate labeled validation/test data. In the figure below, blue is the regression model's predicted values, and red is the actual labeled function. The blue line's closeness with the red line will give us a measure of How good is our model?

While building the regression model, we define our cost function. It measures the value of the learned values' deviation from the predicted values. Optimizers make sure that this error reduces over the progressive iterations, also called epochs.

Some of the most common error functions (or cost functions ) used for regression problems are :

Mean Squared Error ( MSE ):

Root Mean Squared Deviation/Error ( RMSD/RMSE ):

Mean Absolute Error ( MAE ):

Note: Yi is the predicted value, Yi' is the actual value, and N is the total samples over which prediction is made.

Some famous Examples of regression problems are:

Predicting the house price based on the size of the house, availability of schools in the area, and other essential factors.
Predicting the sales revenue of a company based on data such as the previous sales of the company.
Predicting the temperature of any day based on data such as wind speed, humidity, atmospheric pressure.

Classification Problems

In regression problems, the mapping function that algorithms want to learn is discrete. The objective is to find the decision boundary/boundaries, dividing the dataset into different categories.

More formally:

Classification is a type of problem that requires the use of machine learning algorithms that learn how to assign a class label to the input data.

For example, suppose there are three class labels, [Apple, Banana, Cherry]. But the problem is, machines don't have the sense to understand these labels. That's why we need to convert these labels into a machine-readable format.
For the above example, we can define

Apple = [1,0,0], Banana = [0,1,0], Cherry = [0,0,1]

Once the machine learns from these labeled training datasets, it will give probabilities of different classes on the test dataset like this :

[P(Apple), P(Banana), P(Cherry)]

These predicted probabilities can be from one type of probability distribution function (PDF), and the actual (true) labeled dataset can be from another probability distribution function (PDF). If the predicted distribution function follows the actual distribution function, the model is learning accurately.
Note: These PDF functions are continuous. As a similarity between classification and regression, if the predicted PDF follows the actual PDF, we can say the model learns the trends.

Some of the standard cost functions for the classification problems would be :

Categorical Cross-Entropy:

Suppose there are M class labels, and the predicted distribution for the i-th data sample is :

P(Y) = [Yi1', Yi2', ………. , YiM’]

And, actual distribution for that sample would be,

A(Y) = [Yi1, Yi2, ……….., YiM]

Cross Entropy (CEi) = — (Yi1*log(Yi1') + Yi2*log(Yi2') + …… + YiM*log(YiM’))

Binary Cross-Entropy:

This is a special case of categorical cross-entropy, where there is only one output that can have two values, either 0 or 1. For example, if we want to predict whether a cat is present in any image or not.

Here, the cross-entropy function varies with the true value of Y,

CEi = -Yi1*log(Yi1') , if Yi1 = 1

CEi = -(1-Yi1)*log(1-Yi1'), if Yi1 = 0

And similarly, Binary-Cross-Entropy would be averaged over all the datasets.

Now, the primary question that we should ask ourselves is,

If PDFs (probability distribution functions) are continuous in the range of [0,1], why can't MAE/MSE be chosen here?

Take a pause and think!
Reason: MAE and MSE do well when the probability of an event occurring is close to the predicted value or when the wrong prediction's confidence is not that high.

To understand the term of confidence of prediction, let's take one example :
Suppose our ML model predicted that the patient-lady in the figure below is pregnant, and our model predicted it with the probability of 0.9. We can say that our model is very much confident. Now let's consider one scenario when the ML model says the patient-man in the below figure is pregnant with the probability of 0.9. This is a case where the model predicts something wrong and is confident about the prediction.
To address these cases, the model needs to be penalized more for these predictions. Right?

Let's calculate the cross-entropy (CE), MAE, and MSE of the case where the ML model is predicting that a man is pregnant with high confidence (Probability (Y')= 0.8). Obviously, the actual output Y will be 0 here.

CE = -(1-Y)*log(1-Y’) = -(1 – 0)*log(1 – 0.8) = 1.64

MAE = |(Y-Y’)| = |0–0.8| = 0.8

MSE = |(Y-Y’)²| = (0–0.8)² = 0.64

As you can see, MAE and MSE have lower values than CE, which means the Cost function /Error function produces more value. Hence the model should be penalized more.

That's why we needed different cost functions for the classification problem.

The most common evaluation metric for classification models would be,

Accuracy
Confusion Matrix
F1-Score
Precision
Recall etc. (Definitions can be found here ). We will learn about these terms in greater detail in our later blogs.

Examples of classification problems could include:

Classifying if a mail is spam or not, based on its content, and how others have classified similar types of mails.
Classifying a dog breed based on its physical features such as height, width, skin color.
Classifying whether today's weather is hot or cold.

Algorithms for Classification:

Logistic Regression
Support Vector Classification
Decision Tree

Can we solve the same problem using regression as well as classification techniques?

Yes! We can. Let's take one example.

Problem statement: Predict the steering angle of an autonomous vehicle based on the image data.

Constraints: Steering angle can take any value between -50⁰ and 50⁰ with a precision of ±5⁰.

Regression Solution: This solution is simple, where we can map the images to the steering angle's continuous function, which continuously gives the output. Like steering angle = 20.7⁰ or steering angle = 5.0⁰.

Classification Problem: We stated that Precision is ±5⁰, so we can divide the entire range of -50⁰ to 50⁰ in 20 different classes by grouping every 5⁰ at a time.

Class 1 = -50⁰ to -46⁰

Class 2 = -45⁰ to 41⁰

…

Class 20 = 46⁰ to 50⁰

Now we have just to classify the input image into these 20 classes. This way, the problem is converted into a classification problem.

Critical Questions to explore

What is the difference between classification and regression problems?
Why do we have different cost functions in the case of different problem statements?
How do these cost functions decide that the problem is a classification problem or a regression problem?
When do we use binary cross-entropy, and when do we use categorical cross-entropy?
Can you find more such problem statements that can be solved both ways?

Conclusion

In this article, we discussed the concepts of classification and regression problems in detail. We also discussed the difference in cost functions like MAE, MSE, and Categorical Cross Entropies that are the critical difference. Meanwhile, in the end, we discussed a common problem statement where we discussed one famous problem statement that can be solved by considering the problem statement as classification and a regression problem statement. We hope you have enjoyed the article and learned something new.

If you have any ideas/queries/doubts/feedback, please comment below or write us at contact@enjoyalgorithms.com. Enjoy learning, Enjoy Machine Learning, Enjoy algorithms!