Miguel Otero Pedrido
Sep 1 · 1 min read

I don’t understand what your are trying to say here. I mean, you take the gradient of the loss function, deriving in terms of the various parameters. The gradients, in the earlier layers, tend to be smaller, I get it, but getting the loss function from the gradients? I don’t get that at all.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade