Python — Iteration Protocol — Part 1

Sunilkathuria
In Computing World
Published in
7 min readJun 12, 2024

Notice the code snippet below.

For both strn of type string and a_list of type list, it is possible to iterate through every element using a for loop. It is also possible to access a particular element of these two variables.
It is possible to do so as both strings and lists belong to the category of a “sequence.” Other sequence types in Python are lists, ranges, tuples, bytes, and byte- arrays.
Sequence types in Python implement a sequence protocol that allows you to iterate through elements using a for-loop or address a particular item using an index (making it indexable). They are similar to iterables. They maintain the length of elements and support operations like ‘in’ and ‘not in and slicing. Sequence type focuses on objects that represent an ordered collection of elements.

Note: All sequences are iterable. However, all iterables are not sequences. This article, however, will focus on the ‘iterable’ part of the sequence types.

This article will teach us to implement iterable functionality using the sequence and iterator protocols on user-defined data types.
This means we should be able to iterate through elements of a user-defined object using a for-loop and refer to a particular element using an index.

For this, we will implement several dunder functions. These are __iter__() and __next__() for iteration protocol and __len__() and __getitem__() for sequence protocol.

Dunder functions: These are also known as magic or special methods. They start and end with double underscores, hence the name. Dunder methods customize user-defined objects and align them with the language’s constructs.

Sequence Protocol

To implement the sequence protocol in a user-defined data type, we implement two dunder methods, __len__() and __getitem__(). For this, we will create a class “MyBookCollection.” This class will maintain a list of books using a linked list. This class will implement a function add_book to add books to the collection and two dunder methods, __len__() and __getitem__(), to iterate through the elements (books) of the book collection. A class Book represents each book.

Note: Negative indexing and slicing are not implemented to keep the code easy to understand.

Refer to the code below.

Note: The implementation of the class Book does not change. Hence, we will not repeat this code in the article.

With the above implementation, we can now use a for loop to iterate through the books in the book collection. While executing the for loop, Python internally calls the method __getitem__() until this method raises an exception. The for-loop breaks when it encounters an exception.
An index can be used to refer to a particular book in the collection. Here, too, Python internally calls the method __getitem__().

The intent

The approach we used to implement the Sequence protocol is incorrect. Then, what is the correct approach?
The above implementation does not show the class’s intent to be of the sequence type, although it provides the required functionality (able to use the for loop and indexing to refer to a particular element.) Second, skipping the implementation of __len__() is possible, yet the required functionality still works.

A class must implement a Sequence interface to make this intent visible in Python and ensure that a user-defined object adheres to a defined protocol. This interface is available in the module Collections.abc.

Info: The sequence interface ensures that an object intending to behave like a sequence-type object must implement the interfaces __len__() and __getitem__().
Built-in iterables like lists, tuples, and strings implement this interface. It ensures that built-in type and user-defined sequence-like objects follow a common protocol.

Following is the new implementation of the class BookCollection that implements a Sequence interface.

Iterators and Iterables

We start by understanding the relationship between an iterable and an iterator and iteration protocol.

Getting an Iterator

An iterable object contains the data elements (in our case, all the books) and returns an iterator. For this, an iterable implements the dunder method __iter__(). An iterable also uses this method to do the initial setup for iteration.

Iterating through elements

An iterator object is responsible for iterating the elements of an iterable one by one, keeping track of the last returned element. When there are no more elements to return, the iterator raises an exception “StopIteration’. This functionality is implemented in the dunder function __next__().

This whole process is called “Iteration Protocol.”

Using Iteration Protocol

Python provides two functions for this functionality.
First is the iter(iterable object). When this function is called on an iterable, internally, Python calls the __iter__() of the iterable object and returns an iterator.
Second is the next(iterator object). When this function is called on an iterator object, internally, Python calls the __next__() of the iterator. This returns the next available element or raises an exception.

Implementing Iteration Protocol

Iterable Iterators

In this case, an iterable (class BookCollection) implements the __iter__() and __next__() methods. When a class implements both methods, it is called an iterable iterator. The __iter__() method returns “self”.
This functionality is suitable when only one iterator object is needed to iterate through the elements, and it is possible to iterate only once through the elements.

Note: If we create more than one iterator object from an iterable iterator, these objects refer to the same iterable (it returns self) and interfere with each other while iterating through the elements.

Note: Since we have not implemented the Sequence protocol, printing book details using an index is not possible.

Refer to the code snippet of the class BookCollection below.

Iterables

Iterable (class BookCollection) and iterator (a new class BooksIterator) are separate objects in this implementation.
The __iter__() method returns an iterator object (object of class BooksIterator) that references the elements of iterable (object of class BookCollection).
Each iterator object maintains its own tracking of elements. This approach is suitable when we intend to iterate through the elements multiple times.

Note: We cannot rewind an iterator once the iteration is complete.

Note: Since we have not implemented the Sequence protocol, printing book details using an index will not be possible.

User-defined object implements Sequence and iteration protocol

In this case, the class (BookCollection) implements both protocols, which means we have implemented methods __len__() and __getitem__() for sequence protocol, __iter__() and __next__() for iterator protocol. Printing all the books using a for loop and a particular book using an index is now possible.
When we use the for-loop to iterate through the elements, Python uses the iterator implementation to iterate using the for-loop.

Summary

  • We learned to implement a Sequence protocol using a Sequence interface. We must implement two dunder methods, __len__() and __getitem__().
  • We understood the meaning of iterables, iterators, and iteration protocol.
  • We implemented iteration protocol by implementing two dunder methods __iter__() and __next__().
  • We learned about two Python functions, iter and next, which call the corresponding __iter__() and __next__() methods of an iterable object.

Coming up

The next article will teach us about generators and the Itertools module.

Reference

Sequence Types — list, tuple, range

GitHub-Source Code

Books — Python Tricks. A buffet of awesome Python features.
Effective Python: 59 Specific Ways to Write Better Python

Did you enjoy this article? Please clap it and share it with your friends.
If you have found this article helpful, please subscribe to read more articles.
Feel free to comment!!

--

--