What is “yield” or “generator” in Python?

What is a “coroutine” in Python?

Harshal Parekh
Apr 12, 2020 · 9 min read

A lot of new python learners have a hard time wrapping their brain around PEP 380. I am usually asked:

  1. What does the “yield” keyword do?
  2. What are the situations where “yield from” is useful?
  3. What is the classic use case?
  4. Why is it compared to micro-threads?

To understand what yield does, you must understand what generators are. And before you can understand generators, you must understand iterables.

Iterables

An iterable is an object that has an __iter__ method which returns an iterator, or which defines a __getitem__ method that can take sequential indexes starting from zero (and raises an IndexError when the indexes are no longer valid). So an iterable is an object that you can get an iterator from.

  • anything that can be looped over (i.e. you can loop over a string or file) or
  • anything that can appear on the right-side of a for-loop: for x in iterable: ... or
  • anything you can call with iter() that will return an ITERATOR: iter(obj)
>>> mylist = [1, 2, 3]
>>> # mylist = [x*x for x in range(3)] # or a list comprehension
>>> for i in mylist:
... print(i)
1
2
3

Generators

Generators are iterators, a kind of iterable you can only iterate over once. Generators do not store all the values in memory, they generate the values on the fly:

>>> mygenerator = (x*x for x in range(3))
>>> for i in mygenerator:
... print(i)
0
1
4

It is just the same except you used () instead of []. BUT, you cannot perform for i in mygenerator a second time since generators can only be used once: they calculate 0, then forget about it and calculate 1, and end calculating 4, one by one.

Yield

yield is a keyword that is used like return, except the function will return a generator.

>>> def func():                # a function with yield
... yield 'I am' # is still a function
... yield 'a generator!'
...
>>> gen = func()
>>> type(gen) # but it returns a generator
<type 'generator'>
>>> hasattr(gen, '__iter__') # that's an iterable
True
>>> hasattr(gen, 'next') # and with .next (.__next__ in Python 3)
True # implements the iterator protocol.

Here it’s a useless example, but it’s handy when you know your function will return a huge set of values that you will only need to read once.

To master yield, you must understand that when you call the function, the code you have written in the function body does not run. The function only returns the generator object, this is a bit tricky :-)

The first time the for calls the generator object created from your function, it will run the code in your function from the beginning until it hits yield, then it'll return the first value of the loop. Then, each subsequent call will run another iteration of the loop you have written in the function and return the next value. This will continue until the generator is considered empty, which happens when the function runs without hitting yield. That can be because the loop has come to an end, or because you no longer satisfy an "if/else".

>>> list(gen)
['I am', 'a generator!']
>>> list(gen)
[]

You’ll have to make another if you want to use its functionality again:

>>> list(func())
['I am', 'a generator!']
  • A function with yield, when called, returns a Generator.
  • Generators are iterators because they implement the iterator protocol, so you can iterate over them.

yield is only legal inside of a function definition, and the inclusion of yield in a function definition makes it return a generator.

yield provides an easy way of implementing the iterator protocol, defined by the following two methods: __iter__ and next (Python 2) or __next__ (Python 3). Both of those methods make an object an iterator that you could type-check with the Iterator Abstract Base Class from the collections module.

The yield keyword is reduced to two simple facts:

  1. If the compiler detects the yield keyword anywhere inside a function, that function no longer returns via the return statement. Instead, it immediately returns a lazy "pending list" object called a generator
  2. A generator is iterable. What is an iterable? It’s anything like a list or set or range or dict-view, with a built-in protocol for visiting each element in a certain order.

In a nutshell: a generator is a lazy, incrementally-pending list, and yield statements allow you to use function notation to program the list values the generator should incrementally spit out.

Coroutines

yield forms an expression that allows data to be sent into the generator.

def bank_account(deposited, interest_rate):
while True:
calculated_interest = interest_rate * deposited
received = yield calculated_interest
if received:
deposited += received


my_account = bank_account(1000, .05)
first_year_interest = next(my_account)
# 50.0
next_year_interest = my_account.send(first_year_interest + 1000)
# 102.5

Cooperative Delegation to Sub-Coroutine

def money_manager(expected_rate):
under_management = yield # must receive deposited value
while True:
try:
additional_investment = yield expected_rate * under_management
if additional_investment:
under_management += additional_investment
except GeneratorExit:
# ignored
finally:
# ignored

def investment_account(deposited, manager):
"""
very simple model of an investment account
that delegates to a manager
"""
next(manager) # must queue up manager
manager.send(deposited)
while True:
try:
yield from manager
except GeneratorExit:
return manager.close()
my_manager = money_manager(.06)
my_account = investment_account(1000, my_manager)

first_year_return = next(my_account)
# 60.0
next_year_return = my_account.send(first_year_return + 1000)
# 123.6

The concept that yield from generator is equivalent to for value in generator: yield value does not even begin to do justice to what yield from is all about. Because, let's face it, if all yield from does is expand the for loop, then it does not warrant adding yield from to the language and preclude a whole bunch of new features from being implemented in Python 2.x.

What yield from does is it establishes a transparent bidirectional connection between the caller and the sub-generator:

  • The connection is “transparent” in the sense that it will propagate everything correctly too, not just the elements being generated (e.g. exceptions are propagated).
  • The connection is “bidirectional” in the sense that data can be both sent from and to a generator.

Why is it compared to micro-threads?

I think what this section in the PEP is talking about is that every generator does have its own isolated execution context. Together with the fact that execution is switched between the generator-iterator and the caller using yield and __next__(), respectively, this is similar to threads, where the operating system switches the executing thread from time to time, along with the execution context (stack, registers, ...).

The effect of this is also comparable: Both the generator-iterator and the caller progress in their execution state at the same time, their executions are interleaved. For example, if the generator does some kind of computation and the caller prints out the results, you’ll see the results as soon as they’re available. This is a form of concurrency.

That analogy isn’t anything specific to yield from, though - it's rather a general property of generators in Python.

A Bug in Python “yield”:

Note: this was a bug in the CPython’s handling of yield in comprehensions and generator expressions, fixed in Python 3.8, with a deprecation warning in Python 3.7. See the Python bug report and the What's New entries for Python 3.7 and Python 3.8.

>>> [(yield i) for i in range(3)]
<generator object <listcomp> at 0x0245C148>
>>> list([(yield i) for i in range(3)])
[0, 1, 2]
>>> list((yield i) for i in range(3))
[0, None, 1, None, 2, None]

It seems odd:

  • that a list comprehension returns a generator and not a list
  • and that the generator expression converted to a list and the corresponding list comprehension contain different values.

Generator expressions, and set and dict comprehensions are compiled to (generator) function objects. In Python 3, list comprehensions get the same treatment; they are all, in essence, a new nested scope.

You can see this if you try to disassemble a generator expression:

>>> dis.dis(compile("(i for i in range(3))", '', 'exec'))
1 0 LOAD_CONST 0 (<code object <genexpr> at 0x10f7530c0, file "", line 1>)
3 LOAD_CONST 1 ('<genexpr>')
6 MAKE_FUNCTION 0
9 LOAD_NAME 0 (range)
12 LOAD_CONST 2 (3)
15 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
18 GET_ITER
19 CALL_FUNCTION 1 (1 positional, 0 keyword pair)
22 POP_TOP
23 LOAD_CONST 3 (None)
26 RETURN_VALUE
>>> dis.dis(compile("(i for i in range(3))", '', 'exec').co_consts[0])
1 0 LOAD_FAST 0 (.0)
>> 3 FOR_ITER 11 (to 17)
6 STORE_FAST 1 (i)
9 LOAD_FAST 1 (i)
12 YIELD_VALUE
13 POP_TOP
14 JUMP_ABSOLUTE 3
>> 17 LOAD_CONST 0 (None)
20 RETURN_VALUE

The above shows that a generator expression is compiled to a code object, loaded as a function (MAKE_FUNCTION creates the function object from the code object). The .co_consts[0] reference lets us see the code object generated for the expression, and it uses YIELD_VALUE just like a generator function would.

As such, the yield expression works in that context, as the compiler sees these as functions-in-disguise.

This is a bug; yield has no place in these expressions. The Python grammar before Python 3.7 allows it (which is why the code is compilable), but the yield expression specification shows that using yield here should not actually work:

The yield expression is only used when defining a generator function and thus can only be used in the body of a function definition.

This has been confirmed to be a bug in issue 10544. The resolution of the bug is that using yieldand yield from will raise a SyntaxError in Python 3.8; in Python 3.7 it raises a DeprecationWarning to ensure code stops using this construct. You'll see the same warning in Python 2.7.15 and up if you use the -3 command line switch enabling Python 3 compatibility warnings.

Lazy Method for Reading Big File in Python

To write a lazy function, just use yield:

def read_in_chunks(file_object, chunk_size=1024):
"""
Lazy function (generator) to read a file piece by piece.
Default chunk size: 1k.
"""
while True:
data = file_object.read(chunk_size)
if not data:
break
yield data
with open('really_big_file.dat') as f:
for piece in read_in_chunks(f):
process_data(piece)

What can you use a Generator function for?

Generators give you lazy evaluation. You use them by iterating over them, either explicitly with ‘for’ or implicitly by passing it to any function or construct that iterates.

Generators are good for calculating large sets of results (in particular calculations involving loops themselves) where you don’t know if you are going to need all results, or where you don’t want to allocate the memory for all results at the same time.

Another use for generators (that is really the same) is to replace callbacks with iteration. In some situations you want a function to do a lot of work and occasionally report back to the caller. Traditionally you’d use a callback function for this. You pass this callback to the work-function and it would periodically call this callback. The generator approach is that the work-function (now a generator) knows nothing about the callback, and merely yields whenever it wants to report something. The caller, instead of writing a separate callback and passing that to the work-function, does all the reporting work in a little ‘for’ loop around the generator.

For example, say you wrote a ‘filesystem search’ program. You could perform the search in its entirety, collect the results and then display them one at a time. All of the results would have to be collected before you showed the first, and all of the results would be in memory at the same time. Or you could display the results while you find them, which would be more memory efficient and much friendlier towards the user. The latter could be done by passing the result-printing function to the filesystem-search function, or it could be done by just making the search function a generator and iterating over the result.

If you want to see an example of the latter two approaches, see os.path.walk() (the old filesystem-walking function with callback) and os.walk() (the new filesystem-walking generator.) Of course, if you really wanted to collect all results in a list, the generator approach is trivial to convert to the big-list approach:

big_list = list(the_generator)

When a Generator function should not be used

Use a list instead of a generator when:

You need to access the data multiple times (i.e. cache the results instead of recomputing them):

for i in outer:           # used once, okay to be a generator
for j in inner: # used multiple times, reuse a list
...

You need random access (or any access other than forward sequential order):

for i in reversed(data): ...     # generators aren't reversibles[i], s[j] = s[j], s[i]          # generators aren't indexable

You need to join strings (which requires two passes over the data):

s = ''.join(data)                # lists are faster than generators in this use case

You are using PyPy which sometimes can’t optimize generator code as much as it can with normal function calls and list manipulations.

I hope you enjoyed the story! Find more in my profile.
Read my previous story: Branch Prediction — Everything you need to know.
Visit my portfolio: https://harshal.one.

Image Attribution.

The Startup

Get smarter at building your thing. Join The Startup’s +787K followers.

Sign up for Top 10 Stories

By The Startup

Get smarter at building your thing. Subscribe to receive The Startup's top 10 most read stories — delivered straight into your inbox, once a week. Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Harshal Parekh

Written by

The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +787K followers.

Harshal Parekh

Written by

The Startup

Get smarter at building your thing. Follow to join The Startup’s +8 million monthly readers & +787K followers.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store