When you run it in eval mode it evaluates loss based on validation set. There must be a way to evaluate ROUGE metric, authors of TextSum provide ROUGE for their model in the Readme file. I don’t know how to do it. I guess you can run code in decode mode and it will generate files which can be used to evaluate ROUGE, based on this comment on “DecodeIO” class in seq2seq_attention_decode.py: “Writes the decoded and references to RKV files for Rouge score. See nlp/common/utils/internal/rkv_parser.py for detail about rkv file.”
Please let me know if you figure it out.