Optimize python code with generators

Chetan Mishra
4 min readApr 21, 2019

--

Have you ever had to write a method that takes an iterable container (list, set, etc) and have to perform operations over it to return another iterable container as output? If yes, then python generators might be the first optimization tool you should look into.

What are python generators?
Generators are iterators, but you can only iterate over them once. It’s because they do not store all the values in memory, they generate the values on the fly. You use them by iterating over them, either with a ‘for’ loop or by passing them to any function or construct that iterates. Most of the time generators are implemented as functions. However, they do not return a value, they yieldit.

Let's see an example of how to use them.
Suppose you have a simple function to take a list of numbers and return the cube of all of these numbers like this:

def get_cubes(numbers):
cubes = []
for number in numbers:
cubes.append(number * number * number)
return cubes


if __name__ == '__main__':
cubes = get_cubes([1, 2, 3, 4, 5])
print(type(cubes))
for cube in cubes:
print(cube)

This would give you the following output as:

<class 'list'>
1
8
27
64
125

Notice that the returned object belongs to the list class.
You could rewrite the same method using a generator like this

def get_cubes(numbers):
for number in numbers:
yield number * number * number


if __name__ == '__main__':
cubes = get_cubes([1, 2, 3, 4, 5])
print(type(cubes))
for cube in cubes:
print(cube)

Now the output is this:

<class 'generator'>
1
8
27
64
125

This time the returned object is of the generator class.
Notice that we did not have to make any changes in the code calling the method here even though the return type is different! That’s because both of these objects areiterable objects.

If the calling code and the result in both cases are the same then what is the advantage of using generators?
The advantage lies in the fact that generators don’t store all results in memory whereas they generate them on the fly, hence the memory is only used when you ask for the result and not before that.
This doesn’t just save us memory but also time.
Let’s verify this practically.

Here is the same program with the added code to calculate the time and memory footprint of the method call.
Also, we’ll use range function to get a large list of numbers as input to the get_cubes method so that we get a significant number in time_diff and mem_diff .
I’ll be using the built-in time library to calculate runtime and an open-source tool memory_profiler (pip install memory-profiler) to get the memory footprint.

Scenario 1- Without generators

import memory_profiler
import time
def get_cubes(numbers):
cubes = []
for number in numbers:
cubes.append(number * number * number)
return cubes


if __name__ == '__main__':

# memory before method call
m1 = memory_profiler.memory_usage()
# start time
t1 = time.clock()

cubes = get_cubes(range(10000000))

# end time
t2 = time.clock()
# memory after method call
m2 = memory_profiler.memory_usage()
time_diff = t2 - t1
mem_diff = m2[0] - m1[0]
print(f"It took {time_diff} Secs and {mem_diff} Mb to execute this method")

The output is -

It took 2.294268 Secs and 408.66015625 Mb to execute this method

Scenario 2- Using generators

import memory_profiler
import time


def get_cubes(numbers):
for number in numbers:
yield number * number * number


if __name__ == '__main__':
# memory before method call
m1 = memory_profiler.memory_usage()
# start time
t1 = time.clock()

cubes = get_cubes(range(10000000))

# end time
t2 = time.clock()
# memory after method call
m2 = memory_profiler.memory_usage()
time_diff = t2 - t1
mem_diff = m2[0] - m1[0]
print(f"It took {time_diff} Secs and {mem_diff} Mb to execute this method")

The output is -

It took 6.1000000000005494e-05 Secs and 0.01953125 Mb to execute this method

As you can see, in Scenario-2, no significant memory in taken and also the execution time is insignificant, while in Scenario-1 we needed 2.29 sec of time and huge memory to execute this.
Even if you start iterating over the result of the method in case of Scenario-2, there will only be an increase in the execution time whereas the memory usage is going to remain almost the same.

So if we add code to iterate over the result with a for loop -

cubes = get_cubes(range(10000000))
for c in cubes:
# do something here

The result in Scenario-1 (without generators) is this:

It took 2.9292 Secs and 477.7421875 Mb to execute this method

The result in Scenario-2 (using generators) is as such:

It took 2.563005 Secs and 0.01953125 Mb to execute this method

--

--