Iterators and Generators in Python

Chirayu Tripathi
5 min readJan 31, 2022

--

What are generators and iterators in python and how to use them efficiently.

Photo by Luca Bravo on Unsplash

Let’s begin with iterators first.

Have you ever wondered how looping through a list works “for element in list”? Well, iterators are behind that functionality. To understand iterators and generators, we must first understand what are iterator and iterable.

An iterator is an object that manages an iteration through a series of values (iterable). For an iterator object ‘i’, every call to the built-in function, next(i) will produce a subsequent element from the underlying series. If there remains no further element within a series a StopIteration exception is raised to indicate that the end of the series is reached. We can create an iterator by iter(obj).

An iterable is an object on which iterator iterates such as list and tuple. Now let’s see the syntax.

data = [1,2,3]
i = iter(data)
print(next(i))
print(next(i))
print(next(i))

The for-loop syntax in python “for i in data” simply implements the above process under the hood. It creates an iterator object for the given iterable and then continuously calls next(iterator) until it catches StopIteration exception.

Some of you might be thinking, what would happen if I add an element in the list before all next calls are executed. It will simply add that element into the list, which can be retrieved by the last next call.

data = [1,2,3,4,5]
i = iter(data)
print(next(i))
print(next(i))
data.append('hi')print(next(i))
print(next(i))
print(next(i))
print(next(i))
#output
1
2
3
4
5
hi

Now let us implement iterator within a class.

You might be wondering why there are double underscore before and after __next__ and __iter__ methods. It is because they both are part of Dunder functions in python, also called magical functions. These Dunder functions are predefined methods for built-in Python classes but can be overridden to achieve a good form of polymorphism. I will be writing another article on this topic but for now, let us focus on just iterators. In the above code __iter__ method returns self which is the same class object with which we call __iter__. We are returning self via __iter__ because we are overriding the iter dunder function so that we can create an iterator for our class and dictate how it behaves. Each call to __next__ method returns the subsequent even number. The underlying mechanism of for loop “for num in evennum(10):” is now overridden, as we have overridden __iter__ and __next__ methods of our class, so each iteration of for loop will return the next even number and will stop as soon as it catches StopIteration exception raised by our __next__ method.

Now let’s see generators.

Generators are referred to as the most convenient technique for creating iterators in python. The syntax of the generator is similar to that of a traditional function, but instead of using a return statement, generators use a yield statement to indicate each element of the series. Consider the example below.

def factors_return(n):
results = []
for k in range(1,n+1):
if n%k == 0:
results.append(k)
return results

def factors_yield(n):
for k in range(1,n+1):
if n%k == 0:
yield k

In the above code, you can see the use of keyword yield. Python uses this to distinguish between normal function and generator. The Function factors_return(n) will return a list containing all factors, whereas factors_yield(n) will produce a sequence of values, and the yield keyword is used to iterate over these values. If we write for factor in factors(100):, an instance of our generator is created and for each iteration of the loop, python executes our procedure present in factors_yield() until a yield statement indicates the next value. At that point, the procedure is temporarily suspended, only to be resumed when another value is requested. When the whole flow of control reaches the natural end of our procedure, a StopIteration exception is automatically raised. In other words, the yield statement interrupts the procedure’s execution and sends a value back to the caller, but maintains enough state to enable the procedure to resume where it left off and when the next value is requested by the for loop, the procedure resumes its functioning from just after the last yield run. In contrast to this, factors_return(n) will return a list of all factors instead of producing a sequence, thus can occupy large memory if a large number is passed such as 1000000000000000000000.

Why should you use Iterators and Generators?

The benefit of using iterators and generators is that they use lazy evaluation, which traditional functions do not. The results are computed only when requested and the entire series need not reside in memory at once, preventing memory-related issues. A generator can effectively produce an infinite series of values. Consider an example below.

# fibonacci using generator
def fibonacci_generator(n):
a = 0
b = 1
for i in range(n):
yield a
future = a+b
a = b
b = future
for i in fibonacci_generator(10000):
print(i)
# fibonacci using list
def fibonacci_list(n):
a=0
b=1
lis = []
count = 0
while count<n:
lis.append(a)
future = a+b
a=b
b=future
count+=1
return lis
print(fibonacci_list(10000))

In the above code, the function fibonacci_list will create a list object and store all the Fibonacci elements in that list. This list will keep increasing as new elements are being pushed into the list, causing it to occupy more and more space. Thus, leading to memory issues after a certain point. In contrast to this, the function fibonacci_generator will produce a sequence of Fibonacci series, and the Fibonacci elements won’t be stored in the list instead, each Fibonacci element will be returned after each iteration of for loop, and thus the memory will not get occupied. In this way, we can generate infinite Fibonacci sequences using generators.

Generators are also widely used while loading data for Machine Learning and Deep Learning problems. This is because, for instance, an entire dataset containing thousands of images can’t be loaded into memory. Thus the generators are used to the rescue. Consider this small code snippet from Image Captioning using deep learning project.

def data_generator(descriptions, features, tokenizer, max_length):
while 1:
for key, description_list in descriptions.items():
feature = features[key][0]
input_image, input_sequence, output_word
create_sequences(tokenizer, max_length, description_list, feature)
yield ([input_image, input_sequence], output_word)

Summary

In this article, we have covered basics of iterators and generators, how to implement them. We also covered why we should use these iterators and generators.

--

--