Would simply minimizing the sum not be equal to minimizing the average?

Arik Sosman

11

It would not make a difference in the optimization because the parameters that make the smallest average residual (divided by that same constant 2m) will simply be the same parameters that make the minimum summation of residuals.

But I think that we add it to the cost function to capture the intuition that we’re looking at how well the line performs on **average**. The average makes it so the cost function doesn’t depend on the size of the data set, so this way we can compare among different models.