Thanks for this quick summary of the paper. I agree that a moderate amount of memorization is not necessarily a bad thing and I suppose it depends on what you think of as memorization. If memorization means just learning a mapping from the data to the labels and that doesn’t generalize then clearly it is a bad thing. But if it does generalize to unseen data then wasn’t that the end goal in most cases? Perhaps framing it in terms of memorization is an academic red herring.