There are lots of minimums and maximums throughout the function, and sorting out which one is the one you should be using can be computationally expensive.

The function is many-dimensional (each weight gets its own dimension) — we need to find the points where all of those derivatives are zero. Also not so trivial.