Achieve performance improvements of 99.8%+ in 1 line of code

This is the article you have been looking “For”

Stacey McLennan-Waldal
CodeX
3 min readAug 5, 2021

--

Is your go-to coding solution a For Loop? Do you have slow run times and long code blocks? Do you know there’s a better way but you don’t have time to learn it or time to even think about it?

“Don’t fix what’s not broken” can work for a long time.

But what if I told you there is at least one better way and it is so simple you cannot afford to miss this?

Read on.

Caveat: I’m the first to admit that I am a beginner+ coder; so this article may not be for you if the below causes you to cringe. But I think there are a lot of people out there just like me, especially with the massive pivot to tech, especially with the momentum of women taking on tech roles. I searched Google for a long time to try and find a faster way for my problem — I hope to help someone who was in my position with this extremely valuable tidbit!

Here’s how I was originally approaching the problem:

>> (30000 , 150)

Not a very large DataFrame but not tiny either. Just enough to cause slow run times with the following code block:

This took 216 seconds overall to execute. That might not sound like much, but in this notebook, there were many code blocks to execute, many iterations to debug, and many cycles of feedback and improvement. So every second counted, especially as it got later in the day and my computer started to slow down, and this was by far the slowest code block to execute.

🚀 Vectorized operations

Then I was enlightened 💡 with this method instead:

💥 df.loc[(df[‘col’] == condition), ‘col to change’] = change 💥

This took 0.307 seconds overall to execute. That is a performance improvement of 99.86%! I mean, I realize I was doing it the super clunky way before, but if no one had ever bothered to tell me about this…

how could I ever get better?

Limitation

“ ‘Premature optimization is the root of all evil.’ Programmers may incorrectly predict where in their code a bottleneck will appear, spending hours trying to fully vectorize an operation that would result in a relatively insignificant improvement in runtime.

There’s nothing wrong with for-loops sprinkled here and there. Often, it can be more productive to think instead about optimizing the flow and structure of the entire script at a higher level of abstraction.” — Look Ma, No For-Loops: Array Programming With NumPy

Takeaway

For me there are 2 lessons here:

  1. Vector operations are a better way to execute data manipulation in a DataFrame in python/pandas to execute an if/then/else approach in a for-loop.
  2. Call to action! More senior data scientists and developers/coders — do not take for granted how much small moments of mentorship can improve a junior’s performance and how much it can mean to them. If you have knowledge and hacks — share them! 💜

--

--

Stacey McLennan-Waldal
CodeX

Data Scientist, E-Commerce Business Owner, Engineer, Advocate. Momma of 2. A friend :) https://stacey-waldal.medium.com/membership