Handling ‘Medium’ Data with Dask — Explore SA Gawler Challenge

Jack Maughan
1 min readFeb 28, 2020

--

In my previous notebooks we’ve looked at data cleaning, feature engineering and machine learning on geological datasets. All of these example datasets can be considered small in terms of ‘big data’, but for more informative data analysis (or ML models) it’s necessary to increase the amount of data we use. However, we reach a problem when incorporating more and more data into our workflows; computing power. This notebook looks at how to deal with this issue, using the python package Dask to perform large data processes with a little laptop and providing a couple of helpful python hints along the way. Link below!

--

--