Follow This Approach to run 31x FASTER loops in Python!

Learn about the most efficient method of looping over pandas DataFrame

Anmol Tomar
CodeX

--

Pic Credits: Unsplash

Introduction

Loops come very naturally to us. When we learn any programming language, loops are an integral part of the important concepts and also loops are very easy to interpret. So, in Python too, whenever we have to iterate through the rows of the dataset, intuitively, we start thinking about implementing loops.

But, when the dataset is too big, loops take a lot of time to iterate through the DataFrame. So, shall we not use loops at all or can we follow some hacks to overcome this challenge?

Fortunately, there are some hacks!

In this blog, we will look at different ways of iterating(and the associated run time) through a large pandas DataFrame using the different looping methods in pandas. By the end of this blog, you will know which looping techniques will work best for the bigger datasets.

Create our Dataset

We will be using a DataFrame df, having 5 Million rows and 4 columns. Each column is assigned a random integer between 0 and 50.

import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(0, 50, size=(5000000…

--

--

Anmol Tomar
CodeX
Writer for

Top AI writer | Data Science Manager | Mentor. Want to kick off your career in Data Science? Get in touch with me: https://www.analyticsshiksha.com/