Python Primer — Generators and Iterators

Marius Safta
Cluj School of AI
Published in
4 min readJun 5, 2019

Hello World,

In the first five parts of this Python Primer we went through important Python concepts:

The final post in the introductory series is an overview of Generators and Iterators, two very important concepts when working with large amounts of data.

Generators vs Lists

Let’s say we have a function that returns a list of n numbers squared. List and functions should already be familiar, but if not, see previous posts about functions and list comprehensions. This function would look like this:

def squared_nrs(n):    return [nr**2 for nr in range(n)]

We can print a list and iterate through it with the for statement.

my_list = squared_nrs(10)print(my_list)>> [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]for nr in my_list:    print(nr)>> 0149162536496481

So far so good, we’ve seen all this before.

But what if we were to input a bigger number, say 10000? How much time would the function need to execute? Luckily we have a way to find this out quickly, using timeit in the Jupyter Notebook, like so:
%timeit function_name()

%timeit squared_nrs(10000)

12.9 ms ± 476 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

Results vary a lot, depending on what machine is used. We have an average measurement of 12.9 milliseconds across 7 runs of 100 loop executions for each run.

Now let’s create a generator and see how it compares to a list…

To create a generator, we simply write a function like we normally would, except instead of the return keyword, the yield keyword will be used.

def squared_nrs_gen(n):    for nr in range(n):        yield nr**2

Now iterate through it.

my_generator = squared_nrs_gen(10)for nr in my_generator:    print(nr)>> 0149162536496481

Same results are shown, as expected. With timeit, we measure how much time it takes to loop through the generator.

%timeit squared_nrs_gen(10000)

1.44 µs ± 53.2 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

We have 1.44 microseconds vs 12.9 milliseconds, that is 1.44 vs 12900 microseconds. The speed difference is insane!

How generators work

In the previous sections, we saw generators are functions that use the yield keyword instead of return and can be iterated over much faster than a list.

Not only that, but the memory occupied by a generator is infinitesimal compared to the memory requirement of a huge list. To see that, we use the getsizeof method of the sys module. Let’s use it on our previous function and generator.

import sysprint(sys.getsizeof(squared_nrs(10000)))>> 87624print(sys.getsizeof(squared_nrs_gen(10000)))>> 88

Another huge difference is noticed, 88 bytes for the generator vs 87624 for the list. So why not use generators all the time? Well, generators and lists are different things and each should be used when appropriate.

Generators are functions that don’t return a value and finish, like regular functions that use return. Instead, generators are objects that have an iteration protocol. Generators stop and resume their execution as they are being iterated over. This means they don’t have to calculate and keep all values in the memory, but only have the current value. And with the iteration protocol defined, generators are able to calculate the next value when commanded. You can’t directly access values with indexes however, like you can with list. So if you need access to values, lists are the way to go.

Let’s see a quick example. First we need a generator.

def simple_gen():    for nr in range(3):        yield nr**2

Then we assign the generator to a variable.

my_gen = simple_gen()

To iterate through it we use the next function like so: next(my_gen)

print(next(my_gen))>> 0print(next(my_gen))>> 1print(next(my_gen))>> 4print(next(my_gen))

After going through all the values, the next function outputs a StopIteration error. This error shows that all the values have been exhausted.

This exception is automatically handled in for loops, so we never see it when using for.

Iterators

In a previous post we saw we can iterate over strings using the for loop.

word = “Hello”for letter in word:    print(letter)
>> Hello

Strings are iterable objects, but are not iterators themselves. We can’t use the next function to iterate over them.

next(word)

The type error clearly states strings are not iterators. Luckily, we can deal with this by using the iter function.

word_iterator = iter(word)print(next(word_iterator))>> Hprint(next(word_iterator))>> eprint(next(word_iterator))>> lprint(next(word_iterator))>> lprint(next(word_iterator))>> o

Conclusion

In this post we saw generators are much more time and memory efficient than regular lists used for iteration. The range() function we often used in for loops is a generator precisely because of this.

Generator functions are created by using yield instead of return. For more information on generators follow links below:

https://realpython.com/introduction-to-python-generators/

https://stackoverflow.com/questions/1756096/understanding-generators-in-python

As always, the Jupyter Notebook can be found on the Github repo: https://github.com/clujsoai/python

This post concludes the Python Primer series (sort of). Next post will cover resources to further your Python knowledge, after which we’ll delve into more data-sciency topics, starting with numpy.

Your opinions, feedback or (constructive) criticism are welcomed in discussions below or at @mariussafta

Join our Facebook and Meetup groups and keep an eye out for discussions and future meetups.

--

--