Member-only story
Spark vs Pandas, part 4— Recommendations
Why neither Spark nor Pandas is better than the other. Or: Always chose the right tool for the right job.
Originally I wanted to write a single article for a fair comparison of Pandas and Spark, but it continued to grow until I decided to split this up. This is the second part of the small series.
- Spark vs Pandas, part 1 — Pandas
- Spark vs Pandas, part 2 — Spark
- Spark vs Pandas, part 3 — Languages
- Spark vs Pandas, part 4— Recommendation
What to Expect
This last part of the series will give you some advice how to chose between both technologies for implementing a given task.
When to prefer Pandas over Spark
After a detailed analysis of the two contenders Pandas and Spark, we can now summarize the strengths and weakness of both and provide indications when to use what.
Let’s start with Pandas.
Strengths
Pandas is simple to use and you find lots of valuable information and online resources. Pandas performs all it operations reasonably quick, as long as the amount of data is not too huge. It is…