It’s surprising to see so low running_avg_loss after 20 hours of training. For me loss stayed at the level about 1, it’s 4 orders of magnitude higher. I suspect there can be something wrong with training data. Did it convert the entire CNN dataset? What is the size of data/cnn-train.bin?