Uri PatishThe Curious Case of the Validation Loss MismatchYesterday I stumbled upon Greg Yang’s (@TheGregYang) twitter thread on how the cross-entropy loss blows up on held-out data, even though…May 15, 2020May 15, 2020