Hacking Analytics
Published in

Hacking Analytics

Scaling with Pandas beyond the millions (of records)

Photo by billow926 on Unsplash

Typically, Pandas find its' sweet spot in usage in low- to medium-sized datasets up to a few million rows. Beyond this, more distributed frameworks such as Spark or Dask are usually preferred. It is, however, possible to scale pandas much beyond this point.

The typical issue with scaling with Pandas is how to deal with Pandas' memory utilization. Pandas leverage data stores…

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store