What are the big jumps in the CIFAR-10 and 100 plots caused by? They occur at a round number, 40000 and occur at exactly the same iteration for every run. Is 40000 the number of samples you’re training on and the jump at the end of the epoch? I guess not as that would mean we are only looking at 1 or 2 epochs. Alternatively have you run the training for 40000 iterations and then restarted it from a snapshot? Or changed the learning rate?