# 3. Understanding the Cost Function in Linear Regression for Machine Learning Beginners

5 min readMay 30, 2023

❗The article is part of the “Self-taught Machine Learning Collection.”

Discover:

Defining the Cost Function
The Linear Regression Model
Understanding the Parameters
Intuition Behind the Cost Function
Role of the Cost Function
Understanding the Relationship
Visualizing the Cost Function:

💡I write about Machine Learning on Medium || Github || Kaggle || Linkedin. 🔔 Follow “Nhi Yen” for future updates!

Linear regression is a fundamental concept in machine learning, and one of the crucial steps in implementing it is understanding the cost function. In this article, we will break down the cost function in simple terms and explain its importance in training a linear regression model. Whether you are a beginner or looking to refresh your knowledge, this article will provide you with a clear understanding of the cost function and its role in optimizing your model.

Defining the Cost Function

When working with linear regression, we aim to find the best line that fits the training data. The cost function measures the difference between the predicted values of the model and the actual target values. By minimizing this cost function, we can determine the optimal values for the model’s parameters and improve its performance.

The Linear Regression Model

Before delving into the cost function, let’s briefly revisit the linear regression model. It represents a line that fits the training data using parameters called coefficients or weights. The model’s equation is :

where: w and b are the adjustable parameters

The goal is to choose the best values for w and b so that the line generated by the model closely matches the training data.

The equation for cost with one variable is given by:

where:

In the equations:

f_{w,b}(x^{(i)}) represents our prediction for the example i using the parameters w and b.
\frac{1}{2m} \sum\limits_{i=0}^{m-1} \left[ f_{w,b}(x^{(i)}) — y^{(i)} \right]² denotes the squared difference between the target value and the prediction.
These differences are summed over all the m examples and divided by 2m to calculate the cost J(w,b).

Note that in lectures, summation ranges are typically from 1 to m, while in code implementation, the range will be from 0 to m−1.

Understanding the Parameters

The parameters w and b have a significant impact on the shape and position of the line in the graph. Different values of w and b lead to different lines with varying slopes and intercepts. By adjusting these parameters during training, we can optimize the line to better fit the data.

Intuition Behind the Cost Function

The cost function, denoted as J(w, b), measures how well the model’s predictions align with the true target values. It calculates the squared error between the predicted value (f(w, b, x)) and the actual target value (y). The cost function is defined as the sum of squared errors across all training examples, and its value indicates how well the model fits the data.

Role of the Cost Function

The primary objective in training a linear regression model is to minimize the cost function. By finding the values of w and b that result in a small cost function, we achieve a model that accurately predicts the target values. Minimizing the cost function involves adjusting the parameters iteratively until convergence, using techniques such as gradient descent.

Function f(x) and the Cost Function J(w,b)

Visualizing the Cost Function

To gain a better understanding, let’s visualize how the cost function changes with different values of w. We simplify the model by setting b to 0, resulting in the function f(x) = w * x. The cost function J(w) depends only on w and measures the squared difference between f(x) and y for each training example.

Example: Exploring the Cost Function

Suppose we have a training set with three points (1, 1), (2, 2), and (3, 3). We plot the function f(x) = w * x for different values of w and calculate the corresponding cost function J(w).

Function f(x) vs. Cost Function J(w) — Credit: Andrew Ng

When w = 1: f(x) = x, the line passes through the origin, perfectly fitting the data. The cost function J(w) is 0 since f(x) equals y for all training examples.
Setting w = 0.5: f(x) = 0.5 * x, the line has a smaller slope. The cost function J(w) now measures the squared errors between f(x) and y for each example. It provides a measure of how well the line fits the data.

Understanding the Relationship

The relationship between the function f(x) and the cost function J(w) in linear regression can be visualized by plotting them side by side. The cost function J(w) measures the error between the predicted values of the model and the actual values in the training dataset. The goal is to find the optimal parameter values (weights) w that minimize the cost function. By minimizing the cost function, we find the best-fitting line that minimizes prediction errors. This is typically done using optimization algorithms like gradient descent. The plot helps us understand how changes in the parameters w affect the cost function and the relationship between the model’s predictions and the associated error.

References

“Supervised Machine Learning: Regression and Classification” by Andrew Ng.

If you found this article interesting, your support by following steps will help me spread the knowledge to others:
👏 Give the article 50 claps
💻 Follow me
📚 Read more articles on Medium
🔗 Connect on social media Github| Linkedin| Kaggle
🤝 Need a Machine Learning consultant? Hire me HERE!
❗Get UNLIMITED access to every story on Medium with just $1/week — HERE
☕Buy Me a Coffee — HERE

#MachineLearning #LinearRegression #CostFunction #ModelTraining #DataScience