Scaling with Pandas beyond the millions (of records)
Typically, Pandas find its' sweet spot in usage in low- to medium-sized datasets up to a few million rows. Beyond this, more distributed frameworks such as Spark or Dask are usually preferred. It is, however, possible to scale pandas much beyond this point.
The typical issue with scaling with Pandas is how to deal with Pandas' memory utilization. Pandas leverage data stores…