Stop Using Pandas for Large Datasets — Terality is a 20 Times Faster Alternative

Pandas is painfully slow when processing large datasets. Is Terality the solution you’ve been looking for?

Dario Radečić
Geek Culture
Published in
7 min readNov 16, 2021

--

Photo by SpaceX on Unsplash

If I had to pick one Python library I couldn’t live without, it would be Pandas. Nothing else comes close. Does that mean Pandas is without flaws? Well, no — it’s terribly slow for processing large datasets. Also, working with larger-than-memory datasets is a challenge of its own. There are ways around this, and we’ll explore one such option today.

It’s called Terality — a blazing fast freemium service that moves computation away from your machine. It has a dedicated Python library that is 100% identical to Pandas, so the learning curve is non-existent.

This is not a sponsored article, but I’ve decided to reach out to Terality after exploring how the service works. They were kind enough to enroll me in their 2 TB plan free of charge. Their always free plan comes with 500 GB of data processing bandwidth. Is it enough? Continue reading to find out.

What is Terality and how to set up an account

As mentioned earlier, Terality is fast and hosted data processing engine that works just like Pandas. Their API is 100% identical, as you can see from the image below:

Image 1 — Terality website (image by author)

The account registration process is straightforward — just use your Google or GitHub credentials. You’ll get enrolled with a free 500 GB plan, but you can always upgrade if you need more. After the registration, you’ll see a dashboard page:

Image 2 — Terality dashboard (image by author)

The top of the dashboard shows you how much data processing you have left. You’d be surprised by how quickly the usage goes up, but more on that later. The Quickstart section instructs you how to install Terality and how to configure it, since you need an API key to get the thing running:

--

--