Generators in Python

Louis Li
RedSo
Published in
3 min readJun 30, 2017

--

We know there is list in Python and we can use one-liner with square brackets to create a list, like:

>>> length = 3
>>>
>>> aList = [x for x in range(length)]
>>> type(aList)
<type ‘list’>

If, however, we change the square brackets to round brackets, we are creating a generator:

>>> aGenerator = (x for x in range(length))
>>> type(aGenerator)
<type ‘generator’>

A generator is an object in Python in a way that it can be iterated just like a list:

>>> print ‘Iterating aList:’
Iterating aList:
>>> for i in aList:
… print i

0
1
2
>>> print ‘Iterating aGenerator:’
Iterating aGenerator:
>>> for i in aGenerator:
… print i

0
1
2

There are several ways to write a generator in Python. The above is a shortcut way using one-liner. To better illustrate the difference between list and generator, let’s rewrite the above generator using function and yield:

>>> def myGenerator(n):
… i = 0
… while i<n:
… yield i
… i += 1

>>> aGenerator = myGenerator(length)
>>> type(aGenerator)
<type ‘generator’>
>>> for i in aGenerator:
… print i

0
1
2

It behaves exactly the same. Let’s modify the above function to check out what is actually happening within it:

>>> def myGenerator(n):
… i = 0
… while i<n:
… yield i
… print ‘increment i’
… i += 1

>>> aGenerator = myGenerator(length)
>>> for i in aGenerator:
… print i

0
increment i
1
increment i
2
increment i

We can see that i is only incremented in the next round of the loop.

And indeed, as aGenerator is an iterable object, we can use its next() method to loop through it:

>>> aGenerator = myGenerator(length)
>>> aGenerator.next()
0
>>>
>>> aGenerator.next()
increment i
1
>>>
>>> aGenerator.next()
increment i
2

And when we are looping it in a for loop just like above, the for loop is actually calling the next() function of it for us.

Until the StopIteration exception is thrown:

>>> aGenerator.next()
increment i
Traceback (most recent call last):
File “<stdin>”, line 1, in <module>
StopIteration

The keypoint here is the yield keyword. It tells the function that it needs to take a rest here and continue only when it is asked to.

The concept of yield is not unique in Python, it is being used in other languages too. And actually the same concept is used in kernel too. For a multithreaded/time-shared OS, every thread is indeed given a quantum of time to run before the processor switches to another thread. Sometimes for some reason, a thread can yield its remaining time and lets other threads run first.

So, what is the difference between list and generator in python?
You can see that for list, everything needs to be in the memory while for generator, it only needs to generate what is needed when being asked to. Obviously, memory is saved.

Using the above example, if we set length to be a big number, say 1 million, we need to keep the 1 million numbers in memory if we are using list.

Yield in GAE’s ndb
And yield is being used in ndb’s tasklets too to allow us write async calls to the datastore. Let’s discuss it next time.

ps. range() in Python is a list while xrange() is a generator, so using xrange() is always preferred.
pps. range() is a generator in Python 3.

--

--

Louis Li
RedSo

Partner@RedSo, tech lover, GCPUG HK founder. Fall in love with blockchain technology recently.