See more
Hence, warmup (an initial period of training with a much lower learning rate) is a requirement for adaptive optimizers to offset excessive variance when the optimizer has only worked with limited training data.
How to determine if it is better? Take weighted average of two new nodes
R² is the ratio between how good your model is (RMSE)vs. how good is the naïve mean model (RMSE).