# Why do we subtract the slope * alpha in Gradient Descent?

If we are going in the direction of the steepest descent, why not add instead of subtract?

Ok, I got that the derivative (gradient) means the direction of change. But still, why would you subtract this?

## Because your goal is to MINIMIZE the loss function J(θ).

Here is an example with simple scalar.

As you can see from this example, when the derivative is positive, you need to subtract the fraction of that derivative if you want to minimize the cost function. So, in the maximization problem, you would add the alpha * the derivative (slope).

Ok, this is for the scalar example. What about the multi-dimensional?

The same logic applies to the multi-dimension. In the case of more than one dimension, the gradient of a function would just be a vector of all its partial derivatives. So basically nothing changed.