How to correctly select a sample from a huge dataset in machine learning

Choosing a small, representative dataset from a large population can improve model training reliability

Gianluca Malato
Data Science Reporter
6 min readMar 28, 2019

--

Photo by Lukas from Pexels

In machine learning, we often need to train a model with a very large dataset of thousands or even millions of records. The higher the size of a dataset, the higher its…

--

--

Gianluca Malato
Data Science Reporter

Theoretical Physicists, Data Scientist and fiction author. I teach Data Science, statistics and SQL on YourDataTeacher.com. E-mail: gianluca@gianlucamalato.it