Intermediate Python Knowledge
Generators in Python — 5 Things to Know
Understand what generators are and how to use them
1. What is a generator?
To understand Python generators, we can start with the following diagram such that we can have a bigger picture by understanding related concepts. The figure basically shows you the relationships between generators, iterators, and iterables.

When we deal with sequences of values, you may think about built-in data types, such as lists, tuples, and dictionaries. These data types have distinct characteristics such that they serve different purposes. However, one thing in common is that they are all iterables. Being an iterable means that we can go over the elements of the iterable object to perform specific operations. The most straightforward way to understand iterables is to use the most commonly used iteration technique — the for loop, as shown below.
# The general format of the for loop
for item in iterable:
# do something>>> # A simple example of the for loop
>>> for number in [2, 3, 4]:
... print(f"Number: {number}")
...
Number: 2
Number: 3
Number: 4
A special type of iterables is termed iterators. Iterators are Python objects that can produce a data value at a time using the __next__()
method. We can create iterators using the factory function iter()
. Let’s see the example below to understand how iterators work. Please note that when an iterator exhausts its elements (i.e., no more values to render), it will raise StopIteration, as shown in the example when we call the __next__() method for the third time.
>>> # Make an iterator from a string
>>> letter_iterator = iter('ox')
>>> letter_iterator.__next__()
'o'
>>> letter_iterator.__next__()
'x'
>>> letter_iterator.__next__()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
If you want to learn more about iterators and iterables, you can refer to my previous article on this topic.
As a subclass of iterators, generators are a handy mechanism for producing a sequence of values. Why did I say handy? Unlike custom iterators that require implementing the iter()
and __next__()
methods (covered in the linked article above), it’s much easier to create custom generators.
Let’s see what a generator looks like with the following trivial example.
>>> # Define a generator function
>>> def get_number_generator():
... yield 1
... yield 2
...
>>> # Create a generator
>>> number_generator = get_number_generator()
>>>
>>> # Get the elements from the generator one by one
>>> number_generator.__next__()
1
>>> number_generator.__next__()
2
>>> number_generator.__next__()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
In the above code snippet, as you may have noticed, there are some differences between generators and regular iterators that we saw above. First, we use a function that creates a generator as opposed to using the iter()
function. Second, the function uses the yield
keyword to return the desired value when the __next__()
function is called, which leads to the discussion of our second question below.
2. What’s the yield keyword?
With typical Python functions, the program runs from the first line and continues until a return statement where the desired value is rendered. Certainly, we have functions that don’t explicitly return any values, in which case, an implicit None
value is returned. See the trivial example below.
>>> # Define a function that doesn't explicity return any value
>>> def get_no_returns():
... a = 1
...
>>> returned_value = get_no_returns()
>>> returned_value is None
True
In essence, the yield
keyword in the generator functions does two things.
- Return the value to the control of the execution.
- Saves the state of the function such that execution can continue at the same state the next time when the generator is called to yield a value.
Let’s see what these two aspects mean with the following example.
>>> # Define a generator function
>>> def get_number_generator():
... print('Before yielding number 1')
... yield 1
... print('Before yielding number 2')
... yield 2
... print('After yielding both numbers')
...
>>> # Create a generator
>>> number_generator = get_number_generator()
>>>
>>> # Get the elements from the generator one by one
>>> number_generator.__next__()
Before yielding number 1
1
>>> number_generator.__next__()
Before yielding number 2
2
>>> number_generator.__next__()
After yielding both numbers
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
As shown in the code snippet, in total, we called the __next__()
methods three times. After the first time, we got the first generated value 1
and the statement got printed before the yield 1
statement. When we called the __next__()
method for the second time, we didn’t see the Before yielding number 1
printout. Instead, the execution started at the line after yield 1
. In other words, the generator is able to save the state of the function after each value is yielded.
3. What’s a generator expression?
If you’re not entirely new to Python, you must have heard a technique called list comprehension, which is a concise way to construct a list
object. It has the following basic syntax and usage.
>>> # Create a list using list comprehension
>>> squared_numbers = [x*x for x in range(10)]
>>> squared_numbers
[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
A generator expression is also called a generator comprehension, which is a concise way to construct a generator. Very similar to the syntax of list comprehensions, generator expressions use parentheses instead of square brackets. Let’s see an example below. Everything should be straightforward, right?
>>> # Generator expressions to create a generator
>>> squared_numbers_gen = (x*x for x in range(10))
>>> type(squared_numbers_gen)
<class 'generator'>
>>>
>>> # Get the elements
>>> squared_numbers_gen.__next__()
0
>>> squared_numbers_gen.__next__()
1
Compared to the generator function which is introduced in the previous section, generator expressions are much more concise and create generators with just one line of code.
However, it should be noted that using generator functions allows us to create more complicated generators, while generator expressions are mostly used when simpler generators are needed. See the following article for additional information on list comprehensions and generator expressions.
4. Why do we use generators?
We have learned how to create a generator using a function and an expression, but why do we bother using generators?
If you recall, the most important feature of a generator as an iterator is that it renders a value whenever it’s requested. Related to this feature is the computer jargon/concept lazy evaluation or laziness, which means that specific operations are not executed until the need arises. In the case of generators, they don’t bother generating all possible elements until someone calls the generators to do so.
By yielding their elements once a time, the generators are very memory efficient. For example, if we want to operate an enormously large sequence of integers, we need much more memory with a list object than with a generator. Let’s see the below code for this difference. A list object consisting of 1 billion integers requires over 8 GB in memory, while a generator object that can render the same number of integers requires only 96 Bytes. The difference in memory usage is very significant. Why? For a generator, it only needs to know its current state and doesn’t need to load all its elements upfront, unlike lists and other iterators.
>>> number_count = 1_000_000_000
>>>
>>> # Create a list
>>> numbers_list = [x*x for x in range(number_count)]
>>> numbers_list.__sizeof__()
8058558856
>>>
>>> # Create a generator
>>> numbers_gen = (x*x for x in range(number_count))
>>> numbers_gen.__sizeof__()
96
5. What’s a practical example?
As a neuroscientist, some of my research involves the collection of electroencephalogram (EEG) data. As you may know, the EEG data is a continuous recording of your brain wave activity with a high temporal resolution (e.g., 1000 Hz, or 1000 data values per second). For demonstration purposes, let’s use the following code to generate some hypothetical data that are saved in a file. Basically, the file is a space-delimited data that has 1000 rows with each row having 10000 data values (10 seconds recording).
>>> import random
>>>
>>> # Create a file with simulated data
>>> out_file = "EEG_data.txt"
>>>
>>> with open(out_file, "w") as eeg_file:
... for _ in range(1000):
... data = [str(random.gauss(0,1)) for _ in range(10000)]
... line_str = " ".join(line_data) + '\n'
... _ = eeg_file.write(line_str)
...
>>>
If you have any questions about reading and writing files in Python, you can refer to my previous article to learn more about these topics.
This file’s size is about 2 GB, which isn’t a memory-friendly size if we try to load all data entirely. Thus, we need to have a memory-efficient way to allow us to process this file, which is exactly a good use case of generators, whose core advantage is being memory-friendly.
The required pre-processing step at hand is to reduce the sampling rate to 250 Hz such that we can have a smaller dataset for subsequent data processing steps. To do that, we’ll need to calculate the means for every four data values, with which we’ll be able to reduce the sampling rate to 250 Hz. Let’s see the code below how we can take advantage of generators to get the job done.
>>> # Define the needed variables
>>> n = 4
>>> updated_fn = "updated_EEG_data.txt"
>>>
>>> # Use the open() method to create line generator for the files
>>> with open(eeg_fn, "r") as eeg_file, open(updated_fn, "w") as updated:
... # Use the generator in an iteration
... for line_data in eeg_file:
... # Create numbers from the line string data
... numbers = [float(x) for x in line_data.split(" ")]
... calculated_str = list()
... # Calculate means for four-number segments
... for i in range(0, len(numbers), n):
... calculated_str.append(str(sum(numbers[i:i+n])/n))
...
... # Create the line string data for writing
... line_str = " ".join(calculated_str) + '\n'
... # Write the data
... _ = updated.write(line_str)
...
>>>
In the above code, I’ve included applicable comments to explain what each operation does. One critical piece to emphasize is that using the open()
method to open the EEG file will create a file object, which functions as a generator that yields a line of data as string each time. If you check the file size after this data-processing step, you’ll find out that the updated EEG file has been reduced to about 500 MB, which is consistent with the reduced sampling rate of 250 Hz, one-fourth of the original recording data.
Before You Go
In this article, we reviewed several key concepts/questions related to the proper use of generators. Some important take-aways are recapped here.
- Generators are a subclass of iterators, which are all iterables.
- All of these three types of data (i.e., generators, iterators, iterables) can be used in iterations.
- Generator functions use the
yield
keyword to render values once at a time. - Generator expressions are a concise way to create generators, and they have the following syntax:
(expression for x in iterable)
. - Because of the lazy evaluation nature, generators are very memory-efficient, making them particularly useful when we deal with data of a very large size.