Python Iterators and Iterables
An analysis of what they are, how they work and how we use them.
Introduction
A Python Iterator is defined as
An object representing a stream of data.
In a nutshell, an Iterator is an object that can be iterated, which means we can access the elements one by one until no elements are left.
For example, a list is a sequential collection of data accessed using a loop.
my_list = ['a','b','c']
for letter in my_list:
print(letter)
An object that we can loop over is defined as Iterable. Iterators and Iterables are two main Python concepts that sometimes generate confusion because they are strictly related.
Here, the definitions of Python Iterators and Iterables that we will analyse
Definition: An iterable is an object which we can iterate over.
Definition: An Iterator is an object that can be iterated. In other words, an object that allows us to iterate an iterable object.
These two definitions may sound a bit confusing, so I’ll explain the details to have a clear picture.
From a technical point of view, Python defines an Iterator as an object that implements the iterator protocol, which consists of two methods:
__next__: this returns the next item of the container.
__iter__: this returns the Iterator itself.
Repeated calls to the iterator’s __next__() method (or passing it to the built-in function next()) return successive items in the stream.
When no more data are available a StopIteration exception is raised instead.
At this point, the iterator object is exhausted and any further calls to its __next__() method just raise StopIteration again.
The __next__() method returns successive items in the data stream.
Repeated calls to __next__() will eventually raise a StopIteration exception when no more data is available.
At this point, the data sequence is exhausted, and any further call to the __next__() will raise the StopIteration exception again.
Python built-in Iterables
In Python, there are built-in Iterables data structures such as
- Lists
- Tuples
- Strings
- Dictionaries
According to the definition, An Iterable is an object we can iterate over.
We iterate an Iterable using the for — in key.
Let’s try to understand how they work and loop over the elements.
list_of_numbers = [1,2,3,4]
my_tuple = ('first','second','third')
my_str = 'hello world'
for i in list_of_numbers:
print(i)
#Output
#1
#2
#3
#4
for i in my_tuple:
print(i)
#Output
#first
#second
#third
for i in my_str:
print(i)
#Output
#h
#e
#l
#l
#o
#w
#o
#r
#l
#d
Let’s take a deeper look at the Iterable Lists using the dir function to show all the methods and attributes in our list_of_numbers
dir(list_of_numbers)
['__add__', '__class__', '__contains__', '__delattr__', '__delitem__',
'__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__',
'__getitem__', '__gt__', '__hash__', '__iadd__', '__imul__', '__init__',
'__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mul__',
'__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__reversed__',
'__rmul__', '__setattr__', '__setitem__', '__sizeof__', '__str__',
'__subclasshook__', 'append', 'clear', 'copy', 'count', 'extend',
'index', 'insert', 'pop', 'remove', 'reverse', 'sort']
We see that, among others, there is the __iter__() method, which is used to iterate over the elements of the list list_of_numbers. We can take a look at it by doing
list_of_numbers.__iter__()
<list_iterator object at 0x7fc9700cc220>
We can enrich our previous definition by adding that an object that implements the __iter__() method is an Iterable because we can loop over od .
The same is valid for my_tuple and my_str
dir(my_tuple)
['__add__', '__class__', '__contains__', '__delattr__', '__dir__',
'__doc__', '__eq__', '__format__', '__ge__', '__getattribute__',
'__getitem__', '__getnewargs__', '__gt__', '__hash__', '__init__',
'__init_subclass__', '__iter__', '__le__', '__len__', '__lt__', '__mul__',
'__ne__', '__new__', '__reduce__', '__reduce_ex__', '__repr__', '__rmul__',
'__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'count', 'index']dir(my_str)
['__add__', '__class__', '__contains__', '__delattr__', '__dir__', '__doc__',
'__eq__', '__format__', '__ge__', '__getattribute__', '__getitem__', '__getnewargs__',
'__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__',
'__len__', '__lt__', '__mod__', '__mul__', '__ne__', '__new__', '__reduce__',
'__reduce_ex__', '__repr__', '__rmod__', '__rmul__', '__setattr__', '__sizeof__',
'__str__', '__subclasshook__', 'capitalize', 'casefold', 'center', 'count',
'encode', 'endswith', 'expandtabs', 'find', 'format', 'format_map', 'index',
'isalnum', 'isalpha', 'isascii', 'isdecimal', 'isdigit', 'isidentifier',
'islower', 'isnumeric', 'isprintable', 'isspace', 'istitle', 'isupper',
'join', 'ljust', 'lower', 'lstrip', 'maketrans', 'partition', 'replace',
'rfind', 'rindex', 'rjust', 'rpartition', 'rsplit', 'rstrip', 'split',
'splitlines', 'startswith', 'strip', 'swapcase', 'title', 'translate',
'upper', 'zfill']
However, the __next__() method is not implemented in my_str, my_tuple and list_of_numbers. So, according to our definitions, we are not dealing with Iterator.
list_of_numbers.__next__()
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
AttributeError: 'list' object has no attribute '__next__'
>>> next(list_of_numbers)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
TypeError: 'list' object is not an iterator
As we see, the list object is not an Iterator because it doesn’t implement the iterator protocol, and we will understand the reason in a while.
Nevertheless, we can do some magic here starting from what we know:
list_of_number has the __iter__() method we can loop through all the elements.
elem = list_of_numbers.__iter__()
The __iter__() method returns an Iterator object, and we know a Python Iterator implements the method __next__()
elem
<list_iterator object at 0x106a36880>
type(elem)
<class 'list_iterator'>
So, if elem is a Python Iterator, we can use the __next()__ method until the StopIteration exception is raised.
elem.__next__()
1
elem.__next__()
2
elem.__next__()
3
elem.__next__()
4
elem.__next__()
Traceback (most recent call last):
File "<input>", line 1, in <module>
StopIteration
Here the penny drops! The iterator protocol is verified.
One important thing to remember is that since there’s no method to go back, we can only go forward with an Iterator.
There is no way to access previous elements, and the only way to reset an Iterator is to create a new one, as I will show in a while.
Of course, we usually do not use iterators this way, but it gives us a view of how things work internally.
What are the differences between Iterables and Iterators?
Let's do a quick recap of what we have learned so far.
We have seen that Iterators and Iterables can be different objects even if they don’t always be or have to.
We already know that Iterator can be defined if an object implements the __iter__() and the __next__() methods.
We have already verified that Lists implement the __iter__() method and Lists can be defined as Iterables, but we have proved that Lists are not Iterators.
next(list_of_numbers)
Traceback (most recent call last):
File "<input>", line 1, in <module>
TypeError: 'list' object is not an iterator
next(my_tuple)
Traceback (most recent call last):
File "<input>", line 1, in <module>
TypeError: 'tuple' object is not an iterator
next(my_str)
Traceback (most recent call last):
File "<input>", line 1, in <module>
TypeError: 'str' object is not an iterator
So, Lists, Tuples, Strings are Iterables but not Iterators per se, but we have seen how to make list_of_numbers an Iterator.
The concepts of Iterables and Iterators are separate because Iterators keep track of the internal position of the elements. After all, an Iterator needs to maintain information on which element to return next.
It’s important to understand that
If Iterables were to maintain a state, we would only be allowed to use one loop at a time. Otherwise, the other loops would interfere with the state generated at the first loop.
Iterators don’t have this limitation, and we can always return a new Iterator object.
A Python Iterator is also an Iterable object, but not every iterable is an iterator.
We already know that a list is an Iterable object, not an Iterator.
How we use Iterators
The following are some of the most common ways of using Iterators
Iterator in a for-loop with String and Lists
my_str = 'I like Python'
for letter in my_str:
print(letter)
my_list = list(my_str)
for letter in my_list:
print(letter)
#Outout
I
l
i
k
e
P
y
t
h
o
Iterator in a list comprehensions
[letter for letter in my_list]
['I', ' ', 'l', 'i', 'k', 'e', ' ', 'P', 'y', 't', 'h', 'o', 'n']
Iterator with dictionary keys and values
my_dict = {"brand": "motoguzzi","model":"850T5","year":1987}
>>> for key in my_dict:
... print(key)
...
brand
model
yearfor key in my_dict.keys():
... print(key)
...
brand
model
yearfor val in my_dict.values():
... print(val)
...
motoguzzi
850T5
Iterator with Context-Manager
We can easily access and print the content of a file using a Context-Manager because the open() function returns an Iterable object
with open('my_bikes.txt') as bikes:
for bike in bikes:
print(bike)
The iter built-in function
Python has a built-in iter() function to get an Iterator and a next() function to loop through its elements.
Iterators can be easily created from a sequence using the iter() built-in function.
We have already learned that Iterable is an object we iterate over and generates an Iterator when passed to the iter() function.
🔖 An iterator can be created from an iterable by using the function iter().
Under the wood, the object’s class needs either a method __iter__, which returns an Iterator, or a __getitem__ method with sequential indexes starting with 0.
my_iterator = iter(list_of_numbers)
type(my_iterator)
<class 'list_iterator'>
while my_iterator:
print(next(my_iterator))
2
3
4
Traceback (most recent call last):
File "<input>", line 2, in <module>
StopIteration
iter() is the same as calling the __iter__()
next() is the same as calling the __next__()
Let's see another example. The syntax should now be familiar,
>>> x = iter(["motoguzzi","ferrari","maserati"])
>>> print(next(x))
motoguzzi
>>> print(next(x))
ferrari
>>> print(next(x))
maserati
>>> print(next(x))
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>>>
When there is no more item to process, we see that a StopIteration exception is raised.
We have to say there are also some special types of Iterables, called generators, that we will see later.
How to build our Iterator
An easy way of seeing an Iterator is like an object that contains data. We have seen that an object has to implement the __iter__() and __next__() methods.
The __next__() method is the one that produces data.
An important note: the __next__() doesn't have to be defined as long as the __iter__() is defined.
The __iter_() method returns the iterator object itself. So every Iterator is also an Iterable and may be used in most places where other Iterables are accepted.
Let's build our Iterator, a custom class that computes a Fibonacci series.
class FibonacciIterator:
def __init__(self, max: int):
self.max = max
self.a = 0
self.b = 1
self.counter = self.a
def __iter__(self):
"""
This method returns the iterator itself
:return:
"""
return self
def __next__(self):
"""
This method returns the next element.
:return:
"""
if self.counter > self.max:
raise StopIteration
else:
self.a, self.b = self.b, self.a + self.b
self.counter = self.a
return self.counter
series = FibonacciIterator(20)
for num in series:
print(num)
else:
print("stop")
When we run it, we get
1
1
2
3
5
8
13
21
stop
One important thing here is that the StopIteration exception was raised when we print stop, so our Iterator is exhausted.
If we run it twice, we'll see no more elements are available.
for num in series:
print(num)
else:
print("stop")
for num in series:
print(num)
else:
print("stop")
we see that after the first stop is printed, there are no more items to iterate.
1
1
2
3
5
8
13
21
stop
stop
This is because the Iterator keeps the status. An iterator needs to maintain information on which element to return next.
If we want an Iterator that never exhausts its items, we can initiate it whenever we need it.
for _ in range(10):
fibonacci_series = [print(num) for num in FibonacciIterator(20)]
In this case, we are initialising the Iterator at each loop.
Another way of achieving it is to refactor our Iterator splitting the iteration state and the iterator objects. Let's see whether we can do it and why it might be helpful.
First of all, it's important to mention that the __iter__() can also return a new instance of another class that implements the __next__() method, meaning that we can have a class that implements only the __iter__() method and have the __next__() method defined in a separate class.
Iterators must have an __iter__() method that returns the iterator object itself, so every iterator is also iterable and may be used in most places where other iterables are accepted.
The separate class can also be an Iterator. In this case, we pass an Iterator to a class that implements an Iterator and then accepts Iterators.
Let's refactor our initial example by adding the DotheMath Iterator class.
class DoTheMath:
def __init__(self, max: int):
self.max = max
self.a = 0
self.b = 1
self.counter = self.a
def __next__(self):
"""
This method returns the next element.
:return:
"""
if self.counter > self.max:
raise StopIteration
else:
self.a, self.b = self.b, self.a + self.b
self.counter = self.a
return self.counter
class FibonacciIterator:
def __init__(self, max: int):
self.max = max
def __iter__(self):
"""
This method returns the iterator itself
:return:
"""
return DoTheMath(self.max)
We can run it to see that the Iterator never gets exhausted.
series = FibonacciIterator(20)
for num in series:
print(num)
else:
print("stop")
for num in series:
print(num)
else:
print("stop")
1
1
2
3
5
8
13
21
stop
1
1
2
3
5
8
13
21
stop
or
series = FibonacciIterator(20)
for _ in range(10):
fibonacci_series = [print(num) for num in series]
We now have a reusable Iterator because we have separated the iteration state and the iterator objects. We will ensure that our Iterator can't be exhausted.
We can achieve the same behaviour with our initial FibonacciIterator class.
run_this = iter([num for num in FibonacciIterator(10)])
iter() is the same as calling the __iter__() method of our class FibonacciIterator.
next() is the same as calling the __next__() method of our class FibonacciIterator.
Conclusion
An Iterator is an object that can be iterated, which means we can access the elements one by one until no elements are left. The StopIteration exception is raised when there are no more elements to access.
The tool we use to iterate is the for-in . So, Iterators are compatible with loops.
Every object that implements the iterator protocol can be treated as an Iterator.
Another thing worth mentioning is Iterators are also efficient in terms of resources because only one element is handled at once. This is why an Iterator that provides an infinite sequence of elements will never exhaust its memory allocation.
An iterable is an object which we can iterate over. We use for-in to iterate an iterable.
An Iterator is an Iterable, but not all Iterables are Iterators. We have seen how a list implements the __iter__() method but does not implement the __next__().
The important thing to understand is that Iterators maintain states, meaning they know the location of the next element, if any, to return.
Iterables do not maintain state, and that’s good because, if Iterables were to retain state, we would only be allowed to use one loop at a time. Otherwise, the other loops would interfere with the state generated at the first loop.
On the other hand, Iterators do not have this limitation, but we can return a new Iterator object and create Iterators that never iterate forever.