Would simply minimizing the sum not be equal to minimizing the average?
Arik Sosman

It would not make a difference in the optimization because the parameters that make the smallest average residual (divided by that same constant 2m) will simply be the same parameters that make the minimum summation of residuals.

But I think that we add it to the cost function to capture the intuition that we’re looking at how well the line performs on average. The average makes it so the cost function doesn’t depend on the size of the data set, so this way we can compare among different models.

Show your support

Clapping shows how much you appreciated Rohan Kapur’s story.