“Quantity has a quality all its own”
— Joseph Stalin
From what I’ve seen, Stalin’s quote applies equally to AI. I’d really like to see some evidence that a smaller, cleaner dataset yields better results.
Generally more training data >>> better real world model based on today’s algorithms. One of the ways to improve model performance on tasks like image recognition is by generating more messy training data through techniques like jittering.