Python Programming: Manipulating tabular data efficiently

How to Speed up Pandas by 100x

With Great power comes great responsibility.

Pritish Jadhav
Geek Culture
Published in
4 min readSep 6, 2022

--

Pandas is a Data Analysis python library that aids in working with tabular data stored in spreadsheets and databases. It provides a vast set of functionalities for manipulating and transforming structural data aka dataframes. In this blog post, we shall discuss 3 simple tricks for speeding up Pandas operations.

1. Stop using iterrows() :

  • Data manipulation often requires iterating over dataframe rows.
  • iterrows() is often the go-to option for such use cases. However, it is notoriously slow and can be easily swapped by itertuples() .
  • Consider a simple (read: trivial) problem of adding two columns of a Pandas dataframe.
  • Now, let us apply the function simple_sum to every row of the dataframe using iterrows() and measure the time needed to finish the task.

--

--