Hacking Analytics
Published in

Hacking Analytics

Scaling with Pandas beyond the millions (of records)

Photo by billow926 on Unsplash

Typically, Pandas find its' sweet spot in usage in low- to medium-sized datasets up to a few million rows. Beyond this, more distributed frameworks such as Spark or Dask are usually preferred. It is, however, possible to scale pandas much beyond this point.

The typical issue with scaling with Pandas is how to deal with Pandas' memory utilization. Pandas leverage data stores…

--

--

--

All around data & analytics topics

Recommended from Medium

The Change — My Experience at Appsecco

MVC, MVP and MVVM Comparations

On Two Years of dbt

Spring Boot Application Deploy on Docker

Azure Synapse Analytics Integrate pipeline copy activity lineage with Azure Purview

Laravel Firebase Push Notification

Creating Google Cloud Pub/Sub publishers and subscribers with Spring Cloud GCP — Part 1: Setup

How to 3D Print Your Site Model Using SketchUp

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Julien Kervizic

Julien Kervizic

Living at the interstice of business, data and technology | Head of Data at iptiQ by SwissRe | previously at Facebook, Amazon | julienkervizic@gmail.com

More from Medium

Apache Airflow for Data Science — How to Migrate Airflow Metadata DB to Postgres and Enable…

How Easy It Is to Re-use Old Pandas Code in Spark 3.2?

Tips and Tricks: Dask

Files formats for Data Engineers — (Part 1) — Standards Data Formats