Why Pandas itertuples() Is Faster Than iterrows() and How To Make It Even Faster

___
The Startup
Published in
9 min readOct 19, 2019

--

Introduction

In this article, I will explain why pandas’ itertuples() function is faster than iterrows(). More importantly, I will share the tools and techniques I used to uncover the source of the bottleneck in iterrows(). By the end of this article, you will be equipped with the basic tools to profile and optimize your Python code.

The code to reproduce the results described in this article is available here. I assume the reader has a decent amount of experience writing Python code for production use.

Motivation

Imagine you are in this scenario:

You are a data scientist tasked with building a web API to classify whether a picture contains a cat given a batch of images. You decide to use Django to build the API component and to keep things simple, embed the image classifier code in in the same codebase too. You spend a couple of weeks working on this project only to find that your web app is too slow for production use. You consult your colleague who is a software engineer for advice. That colleague tells you that Python is slow and that for anything API-related, Go is the tool of choice.

Do you rewrite everything in Go (including learning a new web framework) or do…

--

--