Optimize your code with Python Generator

Saurabh Mishra
Analytics Vidhya
Published in
5 min readMar 23, 2021
Photo by Chad Kirchoff on Unsplash

Have you ever run into a problem that your data structure is out of the memory to load all of your required intermediate data for the next processing and failed to maintain an internal state every time it is called?

Does it sound familiar? 🤔

Welcome To The World Of Generators

Well, this kind of situation is pretty normal in programming and to avoid the same problem multiple solutions exist.

But to tackle these kinds of the problem more elegantly, Python has specialized functions/expression which is typically known as Generator.

In python wiki, it is defined as,

“Generator functions allow you to declare a function that behaves like an iterator, i.e. it can be used in a for loop.”

So, Generator is nothing but a function/expression that is specialized to behave like an iterator. It means that Generator carries the properties of an iterator i.e. __iter__and __next__methods for a class. This is also called a generator pattern in software engineering.

Now, let’s create the generator pattern for calculating a factorial in the following way,

The above code gives us a nice structure to work with a generator pattern for factorial but its implementation is a bit verbose. Going further, the question pops up,

Can we make it simpler?

Let’s look at another snippet with less code but smarter implementation

We see the code implementation is drastically reduced but it is still serving the same purpose in a much smarter way.

Another interesting point of the above code snippet is yield. This behaves the same as a return statement in a function - but there is an important difference

When the yield statement is encountered, Python returns whatever value yield specifies, but it pauses execution of the function. We can then call the same function again by next() and it will resume from where the last yield was encountered.

Generators Vs Iterator

image source: https://nvie.com/posts/iterators-vs-generators/

As we discussed above, we can sum up our understanding of what we have seen till now with the below points

★ A generator is a function/expression with an iterator capability (but the reverse is not true) that provides the implementation of a design known as a generator pattern.

There are two types of generators in Python.

Generator functions and Generator expressions.

✓ A generator function is any function in which the keyword yield appears in its body.

✓ A generator expression is the generic equivalent of a list comprehension. Its syntax is really elegant for a limited use case

A list comprehension can be converted into a Generator by wrapping it within bracket () instead of the list[]

See the use of () instead [] in the list comprehension — line 4

★ It is memory efficient as it doesn’t store everything at once but execute during the iteration and maintains the internal state.

Use Case & Implementation

Amazon Simple Queue Service (SQS) is a messaging system which use to hold the generated messages for a defined period. Any consumer wish to consume those hold messages can connect to SQS and process them.

More information about AWS SQS can be found here.

Now, let’s think if as a consumer, I’m asked to consume messages from SQS and process them to get a general summary of messages, e.g. summation of value based on type (once we will see the sample data and example then requirement will be quite clear) then what would be our implementation steps?

To do so, maybe the below steps can be taken

  • Connect to AWS SQS.
  • Consume the hold messages and store them in some data structure for further processing.
  • Perform analytics on captured messages.

So far so good.

But did you see any problem with the second bullet point?

By consuming all of the messages in a data structure (e.g. in a list or array) at a certain point these placeholders can explode if there are huge messages in SQS. Out of Memory is an obvious and common problem at this stage. Agree??

Here the significance of the generator comes into the picture.

Step 1- raw messages

Let’s assume the sample datasets in SQS looks like

{‘Messages’: [{‘MessageId’: ‘762b5a79–29b2–72b8-f788–606ccf806629’, ‘ReceiptHandle’: ‘urgtrhwwtg’, ‘MD5OfBody’: ‘91e9b5c6e0f9860130e56f575680744d’, ‘Body’: ‘{“type”: “pageview”, “value”: 1, “occurred_at”: “2021–03–03 10:33:38”}’, ‘Attributes’: {‘SenderId’: ‘AIDAIT2UOQQY3AUEKVGXU’, ‘SentTimestamp’: ‘1614764020782’, ‘ApproximateReceiveCount’: ‘6’, ‘ApproximateFirstReceiveTimestamp’: ‘1614784965208’}}],…… up to n messages}

Step 2- parsed messages

Now, we are interested in the italic bold part from the above message to perform our analysis i.e. below string

{“type”: “pageview”, “value”: 1, “occurred_at”: “2021–03–22 10:33:38”}’, Body’: ‘{“type”: “pageview”, “value”: 1.5, “occurred_at”: “2021–03–22 10:33:38”}’ , {“type”: “doc-view”, “value”: 4.5, “occurred_at”: “2021–03–22 10:33:38”}’ …… up to n messages

If we keep accumulating the above messages into the list then that won’t be ideal, because if our queue is large, this spends a lot of memory. We might run out of memory, and lose all the messages. But if we rewrite this as a generator, and yield the messages as we receive them then the code will become polished, optimized, and memory efficient.

Step 3- type and its sum

Once all messages are parsed then we need to get a sum for all the type and an expected output should look like this

{ “pageview”: 2.5,}, { “doc-view”: 4.5,} …..up to k messages ( where k ≤ n)

For this use-case implementation, I will use boto3 SDK. S Installation guide and configuration settings can be found here

Let's see the implementation in the code.

So, in this blog, we have learned generators, usage, and their implementation. To keep the explanation and implementation simple, just the bare minimum code is captured here. However, the full implementation including AWS SQS message deletion (after consumption), sum and counts stats on each type, logging (console and TimedRotatingFileHandler an important handler if your job is running for a long time), writing stats in an out file, unit test (a sample test), setup conf and Dockerfile are developed as part of full sample project which can be found here on my github.

Thanks for reading ❤️

--

--