How to correctly select a sample from a huge dataset in machine learning

Choosing a small, representative dataset from a large population can improve model training reliability

Gianluca Malato
Data Science Reporter
6 min readMar 28, 2019


Photo by Lukas from Pexels

In machine learning, we often need to train a model with a very large dataset of thousands or even millions of records. The higher the size of a dataset, the higher its…



Gianluca Malato
Data Science Reporter

Theoretical Physicists, Data Scientist and fiction author. I teach Data Science, statistics and SQL on E-mail: