Python Programming: Manipulating tabular data efficiently
How to Speed up Pandas by 100x
With Great power comes great responsibility.
Published in
4 min readSep 6, 2022
Pandas is a Data Analysis python library that aids in working with tabular data stored in spreadsheets and databases. It provides a vast set of functionalities for manipulating and transforming structural data aka dataframes. In this blog post, we shall discuss 3 simple tricks for speeding up Pandas operations.
1. Stop using iterrows() :
- Data manipulation often requires iterating over dataframe rows.
iterrows()
is often the go-to option for such use cases. However, it is notoriously slow and can be easily swapped byitertuples()
.- Consider a simple (read: trivial) problem of adding two columns of a Pandas dataframe.
- Now, let us apply the function
simple_sum
to every row of the dataframe usingiterrows()
and measure the time needed to finish the task.