How to Use Generators Instead of Returning Lists in Python

Understanding the usage of generators and yield in Python

--

This article details how to create generator functions and why we would want to use them in the first place.

The simplest choice for functions that produce a sequence of result is to return a list of items. For example, let’s say we need to find the index of every word in the string. Here, we accumulate results in a list using the append method and return it at the end of the function.

def index_words(text):
result = []
if text:
result.append(0)
for index, letter in enumerate(text):
if letter == ' ':
result.append(index + 1)
return result

This works as expected for some sample input.

string = "Welcome Back Sarah"
result = index_words(string)
print(result)
>>>
[0, 8, 13]

There are two problems with the index_words method though.

The first problem is that the code is a bit dense and noisy. Each time a new result is found, we call the append method.

A better way to write this function is using a generator.

Generator Functions

Generators are functions that can be paused and resumed on the fly, returning an object that can be iterated over. Unlike lists, they are lazy and thus produce items one at a time and only when asked. So they are much more memory efficient when dealing with large datasets.

To create a generator, we define a function as we normally would but use the yield statement instead of return, indicating to the interpreter that this function should be treated as an iterator:

def countdown(num):
print('Starting')
while num > 0:
yield num
num -= 1

The yield statement pauses the function and saves the local state so that it can be resumed right where it left off.

What happens when we call the above function?

>>> val = countdown(3)
>>> val
<generator object countdown at 0x10213aee8>

Calling the function does not execute it. We know this because the string Starting did not print. Instead, the function returns a generator object which is used to control execution.

Generator objects execute when next() is called:

>>> next(val)
Starting
3

When calling next() the first time, execution begins at the start of the function body and continues until the next yield statement where the value to the right of the statement is returned, subsequent calls to next() continue from the yield statement to the end of the function, and loop around and continue from the start of the function body until another yield is called. If yield is not called (which in our case means we don't go into the if function because num <= 0) a StopIteration exception is raised:

>>> next(val)
2
>>> next(val)
1
>>> next(val)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration

Now, going back to our main example of index_words, we define a generator function that produces the same result as before:

def index_words_iter(text):
if text:
yield 0
for index, letter in enumerate(text):
if letter == ' ':
yield index + 1

It’s significantly easier to read because all interactions with the result list have been eliminated. Results are passed to yield expressions instead. The iterator returned by the generator call can easily be converted to a list by passing it to the list built-in module.

>>> indexes = index_words_iter("Welcome Back Sarah")
>>> indexes
<generator object index_words_iter at 0x107252dd0>
>>> for index in indexes:
... print(index, end=" ")
0 8 13>>> result = list(index_words_iter("Welcome Back Sarah")
>>> result
[0, 8, 13]

The second problem with index_words is that it requires all results to be stored in the list before being returned. For huge inputs, this can cause our program to run out of memory and crash. In contrast, a generator version of this function can easily be adapted to take inputs of arbitrary length.

Here, we define a generator that streams input from a file one line at a time and yields outputs one word at a time. The working memory for this function is bounded to the maximum length of one line of input.

def index_file(handle):
offset = 0
for line in handle:
if line:
yield offset
for letter in line:
offset += 1
if letter == ' ':
yield offset

Returning the generator produces the same results.

foo.txt
Welcome Back Sarah
Blah Blah Blah
with open('foo.txt', 'r') as f:
index = index_file(f)
print(list(index))
>>>
[0, 8, 13, 19, 24, 29]

Conclusions:

  • Generators allow us to ask for values as and when we need them, making our applications more memory efficient and perfect for infinite streams of data. They can also be used to refactor out the processing from loops resulting in cleaner, decoupled code.

--

--

Rachit Tayal
Python Features

Sports Enthusiast | Senior Deep Learning Engineer. Python Blogger @ medium. Background in Machine Learning & Python. Linux and Vim Fan