Cost Function
A cost function is a measure of how far off your predictions are from the actual values in your training data. It quantifies the “cost” or “error” of your model’s predictions. The goal in machine learning is to minimize this cost function during the training process. The cost function calculates the difference between your model’s predictions and the actual outcomes. A lower cost means your predictions are closer to the truth.
In the context of linear regression, let’s use the Mean Squared Error (MSE) as an example of a cost function. The MSE cost function is defined as:
J(θ)= 1/2m∑i=1 to m (hθ(x(i))−y(i))^2
Here:
- J(θ) is the cost function,
- m is the number of data points,
- hθ(x(i)) is the predicted value,
- y(i) is the actual (true) value,
The goal of training the model is to find the values of the parameters θ that minimize the cost function J(θ). This process is often done using optimization algorithms, such as gradient descent. Gradient descent iteratively adjusts the parameters in the direction that reduces the cost until it reaches a minimum.
Why 2m and not m?
When you take the derivative of the cost function with respect to the model parameters during the gradient descent process, the factor of 2 cancels out when dealing with the square term.