Understanding Cost Functions in Machine Learning: Types and Applications

What is cost function?

Anishnama
5 min readMar 29, 2023

A cost function, also known as an objective function, is a measure of the difference between predicted values and actual values. It is used to evaluate the accuracy of a model’s predictions and to guide the process of adjusting the model’s parameters in order to minimize the difference between predicted and actual values.

The cost function measures the model’s error on a group of datapoints.

Source: Twitter

What is loss function?

The loss function quantifies how much a model prediction y_hat deviates from the ground truth y for one particular object So, when we calculate loss, we do it for a single object in the training or test sets.

Difference between cost and loss function?

The loss functions are defined on a single training example. It means it measures how well your model performing on a single training example. But if we consider the entire training set and try to measure how well is our model performing on it, we define a function called the cost function.

What are the different types of Cost function?

There are several different types of cost functions used in machine learning, and the choice of cost function depends on the specific task and the type of model being used. Some of the most common types of cost functions are:

  1. Mean Squared Error (MSE): This is a commonly used cost function for regression problems. It calculates the average squared difference between the predicted and actual values.

Python Implementation:

import numpy as np

y_pred = np.array([1.2, 2.1, 3.5, 4.8])
y_actual = np.array([1, 2, 3, 5])
n = len(y_pred)

mse = 1/n * np.sum((y_pred - y_actual)**2)
print('MSE:', mse)

output: MSE: 0.08500000000000002

2. Binary Cross-Entropy/Logloss: This is a commonly used cost function for binary classification problems. It measures the difference between the predicted and actual probabilities of the binary output variable.

Python Implementation:

import numpy as np

y_pred = np.array([0.2, 0.8, 0.6, 0.3])
y_actual = np.array([0, 1, 1, 0])
n = len(y_pred)

bce = -1/n * np.sum(y_actual * np.log(y_pred) + (1 - y_actual) * np.log(1 - y_pred))
print('Binary Cross-Entropy:', bce)

Output:Binary Cross-Entropy: 0.32844691758328565

3. Categorical Cross-Entropy: This is a commonly used cost function for multi-class classification problems. It measures the difference between the predicted and actual probabilities of the multiple output classes.

CCE = -1/n * sum(sum(y_actual * log(y_pred)))

Python Implementation:

import numpy as np

y_pred = np.array([[0.1, 0.3, 0.6], [0.2, 0.7, 0.1], [0.9, 0.05, 0.05]])
y_actual = np.array([[0, 0, 1], [0, 1, 0], [1, 0, 0]])
n = len(y_pred)

cce = -1/n * np.sum(np.sum(y_actual * np.log(y_pred)))
print('Categorical Cross-Entropy:', cce)

Output: Categorical Cross-Entropy: 0.3242870277875165

4. Hinge Loss: This is a cost function commonly used for training models for binary classification tasks using Support Vector Machines (SVMs). It penalizes misclassified samples and aims to maximize the margin between the decision boundary and the training samples.

HL = max(0, 1 — y_actual * y_pred)

Python Implementation:

import numpy as np

y_pred = np.array([0.2, 0.8, 0.6, 0.3])
y_actual = np.array([-1, 1, 1, -1])
n = len(y_pred)

hl = np.mean(np.maximum(0, 1 - y_actual * y_pred))
print('Hinge Loss:', hl)

Output: Hinge Loss: 0.7749999999999999

5. Kullback-Leibler Divergence: This cost function measures the difference between the predicted and actual probability distributions and is commonly used in tasks such as generative modeling.

KL = sum(y_actual * log(y_actual / y_pred))

Python Implementation:

import numpy as np

y_pred = np.array([0.1, 0.5, 0.4])
y_actual = np.array([0.2, 0.3, 0.5])

kl = np.sum(y_actual * np.log(y_actual / y_pred))
print('Kullback-Leibler Divergence:', kl)

Output: Kullback-Leibler Divergence: 0.09695352463929671 Cost functions for different Classification Al

6. Root Mean Squared Error (RMSE): This is a variant of MSE, where the square root of the MSE is taken. It is often used to measure the performance of regression models when the scale of the target variable matters.

Python Implementation:

import numpy as np

y_pred = np.array([1.2, 2.1, 3.5, 4.8])
y_actual = np.array([1, 2, 3, 5])

rmse = np.sqrt(((y_pred - y_actual) ** 2).mean())
print('RMSE:', rmse)

Output: RMSE: 0.29154759474226505

7. Mean Absolute Error (MAE): This cost function measures the average absolute difference between the predicted and actual values. It is used in linear regression, decision trees, and random forests.

Python Implementation:

import numpy as np

y_pred = np.array([1.2, 2.1, 3.5, 4.8])
y_actual = np.array([1, 2, 3, 5])

mae = np.mean(np.abs(y_pred - y_actual))
print('MAE:', mae)

Output: MAE: 0.25000000000000006

Cost functions for different classification algorithms

The choice of cost function for a classification algorithm depends on the specific problem and the model being used. Here are some commonly used cost functions for different classification algorithms:

1. Logistic Regression:

· Binary cross-entropy (or log loss) is commonly used for logistic regression.

· The cost function is:

J(w) = (-1/m) * sum(y * log(h(x)) + (1-y) * log(1-h(x)))

where m is the number of training examples, y is the true label, h(x) is the predicted label, and w is the weight parameter.

2. Support Vector Machines (SVM):

  • Hinge loss is commonly used for SVM.
  • The cost function is:

J(w) = C * sum(max(0, 1 — y * (w.T * x)))

where C is the regularization parameter, y is the true label, w is the weight parameter, and x is the feature vector.

3. Decision Trees and Random Forests:

  • Gini index and cross-entropy are commonly used for decision trees and random forests.
  • The cost function for Gini index is:

J = sum(p * (1 — p))

where p is the proportion of samples in a particular node that belong to a certain class.

  • The cost function for cross-entropy is:

J = -sum(p * log(p))

where p is the proportion of samples in a particular node that belong to a certain class.

4. Naive Bayes:

  • Maximum likelihood estimation (MLE) is commonly used for Naive Bayes.
  • The cost function is:

J = -sum(log(p(x_i | y)))

where x_i is the i-th feature, y is the class label, and p(x_i | y) is the probability of the i-th feature given the class label y.

These are just a few examples of cost functions for different classification algorithms. The specific cost function used will depend on the problem and the algorithm being used.

Conclusion

There are many other cost functions that can be used depending on the specific requirements of the problem at hand. The choice of cost function is an important consideration in machine learning, as it can affect the performance of the model and its ability to generalize to new data.

--

--