Classification & Regression in Machine Learning

Ijaz Khan
unpack
Published in
3 min readJan 17, 2021
Image Source: Javapoint.com

There is often confusion among new Data Science students about the difference between classification and regression in machine learning and also lack awareness about which technique to use for which specific task.

Classification and regression are both Supervised Learning algorithms, which work on labeled datasets and are used for prediction in Machine learning. Both of these techniques come under Predictive modeling.
Developing models that use historical data to make new predictions is called predictive modeling. It is a mathematical problem, which approximates a mapping function (F) from the input (x) to output (y). Generally, we call this a problem of the function approximation.

There are two main, function approximations tasks. i.e. Classification and regression.

Classification

Classification is a type of predicting modeling, which approximates the mapping function from the Input (x) to discrete variable (y). This process includes finding a function that will divide the datasets into different classes based on learning parameters. It actually categorizes data on the basis of its learning from the training dataset.

Example: Spam email detection is one of the best examples of classification. The model which is trained on millions of emails categorizes the new emails as “spam” or “not spam”. After the identification of spam, the email is sent to the spam folder.
let's say a spam email may be assigned the probabilities of 0.1 and “not spam” 0.9. These probabilities can be converted to a class label by selecting the “not spam” label on the basis of its highest predicted likelihood.

The most common metric used for classification is accuracy, we can find it in a very simple way for example.

Classifiction_accuracy = correct_predictions / total_predictions * 100
accuracy = 7/ 10* 100
accuracy = 70%

Types of Classification Algorithms:

  • Logistic Regression
  • K-Nearest Neighbours
  • Support Vector Machines
  • Kernel SVM
  • Naïve Bayes
  • Decision Tree Classification
  • Random Forest Classification

Regression

regression is a type of predicting modeling, which approximates the mapping function from the Input (x) to continuous variable (y). A continuous output refers to a variable containing a real-value, such as a floating-point value or an integer, which are often quantities such as sizes and amounts.

Example: weather forecasting is done using a regression algorithm in which a model is trained on past data that can predict future weather.
Another example is the prediction of the sale price of a house between range of $200,000 to $300,000.

As the regression algorithm predicts a quantity, the metric should find an error in those predictions. We use root mean squared error abbreviated as RMSE.

For example, if a regression model made 2 predictions, 2.5 where the expected value is 2.0 and another of 4.3 and the expected value is 4.0, then the RMSE would be:

RMSE = sqrt(average(error²))
RMSE = sqrt(((2.0–2.5)² + (4.0–4.3)²) / 2)
RMSE = sqrt((0.25 + 0.09) / 2)
RMSE = sqrt(0.17)
RMSE = 0.412

Types of Regression Algorithm:

  • Decision Tree Regression
  • Random Forest Regression
  • Simple Linear Regression
  • Multiple Linear Regression
  • Polynomial Regression
  • Support Vector Regression

Difference between Classification and Regression

  1. In Regression, the output variable must be of continuous nature or real value. In Classification, the output variable must be a discrete value.
  2. The regression algorithm maps the input value (x) with the continuous output variable(y). While the classification algorithm maps the input value(x) with the discrete output variable(y).
  3. Regression models are used with continuous data. Classification models are used with discrete data.
  4. The regression algorithm tries to find the best fit line, to predict the output more accurately. The Classification algorithm tries to find the decision boundary to divided the dataset into different classes.
  5. Regression models solve regression problems such as House price prediction, Weather Prediction, etc. Classification models can solve classification problems such as Identification of Speech Recognition, Identification of cancer cells, spam emails, etc.
  6. The regression Algorithm is divided into Linear and Non-linear Regression. The Classification algorithms are divided into Binary Classifier and Multi-class Classifier.

Thanks for reading ;), Do clap and follow if the article was helpful for you.

--

--