Asynchronous processing on Google App Engine Python

Introduction

At Echo Mobile we heavily use App Engine’s task queue API for background processing. Frontend user requests are served within 60 seconds and mostly involve reading or updating a small number of entities in our Cloud Datastore.

Some tasks need more than 60 seconds to process and this is where the task queue API comes in handy. App Engine exposes a wrapper library around the task queue API for convenience. The library is called deferred and makes it even easier to use the task queue API.

The deferred library enables you to enqueue a python function to run later, this is an asynchronous process. Since our platform runs on automatic scaling our deferred tasks can run for a maximum of 10 minutes (Scaling on App Engine)
We mostly use this library for long running tasks such as report generation, bulk updating/deletion of entities, traversing a Datastore query with a large result set etc.

Overcoming limitations of deferred

Pickle is limited

Internally the deferred library uses pickle (part of the python standard library) for serialization. Pickle is quite limited in the type of functions it can serialize. So far it consistently works with module functions but fails when it comes to instance methods and closures. This forces you to structure code you intend to run asynchronously as module functions. This can be annoying if you use classes heavily.

Introducing cloudpickle

After living with this problem for a while we decided to experiment with cloudpickle, a third party serialization library. Happily, we realized cloudpickle could serialize a wider range of functions as compared to pickle. To use cloudpickle we introduced a wrapper function that we could use to bypass the limitations of deferred.

When the above code runs, the wrapped function (Test.my_unserializable_func) is executed.

Tasks are too large

While cloudpickle enabled us to serialize a wider range of functions, we started getting TaskTooLargeError exceptions from the task queue API for some requests. This meant that the size of our serialized function and its arguments was exceeding the 10kb limit of the task queue api.

Strangely, upon looking at the source of the deferred library, (Deferred library source) we realized that this should never happen. This is because the deferred library uses a wrapping mechanism similar to the one described above for serialized functions greater than 10kb in size.

Internally when the deferred library encounters a serialized function and arguments greater than 10kb in size, they are temporarily stored in the Datastore generating an entity key. A wrapper function is passed to the task queue with the entity key as an argument. When the wrapper function is executed by the task queue, it retrieves the serialized function, de-serializes it and then executes it.

Introduce deferred library locally

To bypass the limitations we were facing, we ended up introducing the deferred library into our codebase. With the introduction, we replaced pickle with cloudpickle in deferred and this solved all our problems. We can now serialize a wider range of functions without a wrapper function and without worrying that we will exceed the 10kb limit imposed on the task queue API.