Data Scientist @Humonics Global Pvt. Ltd.
See more
…[9] which dives deeper into this topic. So I guess differential learning rates has a new name now — discriminative fine tuning. :)
…ent Descent with Warm Restarts, proposed by Loshchilov & Hutter [6]. This method basically uses the cosine function as the cyclic function and restarts the learning rate at the maximum at each cycle. The “warm” bit comes from the fact that…
Another method that is also popular is called Stochastic Gradient Descent with Warm Restarts, proposed by Loshchilov & Hutter [6]. This method basically uses the cosine function as the cyclic …