Is SFrame better than Pandas?:

Jay Doshi
Analytics Vidhya
Published in
2 min readNov 30, 2021

Pandas is one of the most used and well-known python packages out there. Short for “Panel Data”; it makes working with tabular data easy.

It is the go-to package for most developers right from building critical production systems to writing proof-of-concept scripts.

But it has certain limitations. The biggest is that Pandas is an in-memory data structure. It means you can usually not store data larger than the main memory i.e., RAM on your machine.

It may not seem like a huge roadblock at first, but we live in the age of big data and machine learning. Vast amounts of data are being collected and fed into analytical and learning systems to generate insights, create recommendations and the list of applications goes on and on.

So it is critical that we can work with data sizes well beyond what our RAMs can accommodate. Here is where SFrame comes in!

SFrames, short for “Scalable Frames” is part of a larger ecosystem called “Turicreate”. Before moving to how to use SFrames, let’s define a few things:

  • Turicreate — Turicreate is an open-source tool set for creating Core ML models, for tasks such as image classification, object detection, style transfers, recommendations, and more.
  • SFrame — A column-mutable and tabular data frame-like object that can scale to accommodate big data. SFrame data is stored column-wise in SArrays
  • SArray — Every column in an SFrame is called a SArray.

Now that we understand conceptually what Turicreate and SFrame are, let us look at an example:

Let’s install the Turicreate ecosystem to use SFrame.

pip install -U turicreate

Once the setup is complete, let’s get started by importing the dataset.

Output:

Here the file can be significantly larger than what your computer RAM can accommodate. The added benefit of using Turicreate and SFrame is the strong visualization capabilities it provides.

Output:

To sum up, the question of whether SFrames is better than Pandas across the board is like asking which is the best language to code in. The answer is it depends on the use case.

If you’re working with file sizes significantly exceeding what can fit on your RAM, then SFrames is better than Pandas. Also, the wide range of data visualization capabilities that Turicreate provides gets us a step closer to working with and understanding big data.

Hope this article helps!

--

--