Heyy Pavel,
Leena Bharambe

It’s surprising to see so low running_avg_loss after 20 hours of training. For me loss stayed at the level about 1, it’s 4 orders of magnitude higher. I suspect there can be something wrong with training data. Did it convert the entire CNN dataset? What is the size of data/cnn-train.bin?

One clap, two clap, three clap, forty?

By clapping more or less, you can signal to us which stories really stand out.