New Infinite Machine libraries for Python App Engine

Greyboi
The Infinite Machine
5 min readMay 22, 2018

--

This article needs an image. Perhaps this cyberman will help.

The Infinite Machine denotes a few concepts.

Firstly, it’s a programming paradigm for building apps on Google App Engine using Python, which is a functional programming approach to distributed programming using serialisation to move first class functions from one distributed context to another.

Secondly, it’s the practical libraries and techniques that I personally publish and use for doing this. It started with @task, and has moved on from there.

And lastly of course, it is this publication of articles about the first two points.

Here, I’ve got an update on the second point, the code. I’m introducing a new set of python packages, available on pypi, for doing Infinite Machine programming.

Here’s the short list of what’s available so far:

Fully / mostly documented

  • im-util: Some utilities for other Infinite Machine packages. Used by most of them. Notably includes make_flash(), which can take a first class function and its arguments and produce a deterministic hash.
  • im-task: The new home for @task. Should be imported via a framework specific package, ie: im-task-flask or im-task-webapp2
  • im-task-flask: A framework specific package which provides flask support for im-task.
  • im-task-webapp2: A framework specific package which provides webapp2 support for im-task.
  • im-critsec: Provides @critsec, which is like @task, but also runs non-reentrantly, for calls to the same function with the same arguments. This is a form of debouncing; it’ll ensure that the function is called at least once after it is invoked, but if you invoke it, say, 1000 times, it may only run once. Unlike @debouncedtask it is guaranteed to be non-reentrant, while allowing recursive calls to reinvoke the function immediately on exit. It uses memcache to minimise contention issues under high load.
  • im-future: Provides @future, which is an implementation of Distributed Hierarchical Futures. The most basic usage is as a replacement for @task that actually returns a result and allows for success/failure handlers. But, it can also be used in a hierarchy (where the decorated function creates more futures that create more futures and so on, in a parent-child hierarchical relationship), which allows you to write lightweight alternatives to algorithms like mapreduce. The hierarchical features are not documented yet, but you can look at im-futurendbsharded for an example. Plus, there’s a mechanism for reporting progress, even in giant hierarchies. Note: why not just use @future all the time instead of @task? Because it’s far more heavyweight.
  • im-futuretest: This is a framework for writing distributed tests using @futures; because they have success/failure, @futures are a great way to define a test. This package includes a test function decorator, @register_test, and a gui for running and monitoring tests that you can just drop into your application. To use this package, include one of the framework packages; im-futuretest-flask or im-futuretest-webapp2. Note also that the UI provided in this package contains a browser for @futures, which is useful in its own right.
  • im-futuretest-flask: This is a package for wiring up im-futuretest for flask.
  • im-futuretest-webapp2: This is a package for wiring up im-futuretest for webapp2.
  • im-qsb: This package provides a set of functions for constructing Search API query strings. Rather than dynamically concatenating strings together, use this package to construct a JSON description of the query called a QSpec, then use the function render_query_string to render a QSpec down to a string. Proper quoting & escaping is taken care of for you. This isn’t really related to “The Infinite Machine”, but it’s useful if you’re using App Engine and the Search API.

Poorly / not yet documented

  • im-debouncedtask: Provides @debouncedtask, an alternative to @task that debounces calls based on the flash of the function call (ie: two calls with the same function and arguments will get the debounced, but two different calls will not).
  • im-memcacher: Provides @memcacher, a function decorator that uses the flash to identify calls to the same function + args, and caches the results so that subsequent calls get the cached value rather than running again. Best for pure functions, but usable in other cases. It’s great for memoizing distributed algorithms.
  • im-gcscacher: Provides @gcscacher, which is like @memcacher, but caches to Google Cloud Storage. Great for larger data, and caches that you’d like to not expire. Note, obviously this is slower than @memcacher! I’ve had excellent success combining this with pygithub to read from github at runtime, cache the resulting content into GCS, and bust that cache based on github webhooks.
  • im-ndbsharded: Provides ndbshardedpagemap and ndbshardedmap. These are two simple variations on applying adaptive sharding to ndb, using @task under the covers. Great if you need to visit every object of a given type in ndb (or a subset using a query) and you want it to work whether there are a million objects or just 3. Use these as a replacement for ndb’s map functions, which will fail on very large data (they run in the context of a single, time limited task).
  • im-gcsfilesharded: Provides gcsfileshardedpagemap and gcsfileshardedmap, which are analogous to the functions in im-ndbsharded, but use adaptive sharding to visit all the lines in a text file in Google Cloud Storage. The file is broken into lines (delimited by end of line markers) and each line is handled as a string.
  • im-futurendbsharded: Provides futurendbshardedpagemap and futurendbshardedmap, which are analogous to the functions from im-ndbsharded, but use @future instead of @task. This gives you a way to track the progress of the adaptive sharded job, get success/failure, and to generally combine these algorithms with other @future based code.
  • im-futuregcsfilesharded: Provides futuregcsfileshardedpagemap and futuregcsfileshardedmap, which are analogous to the functions from im-gcsfilesharded, but use @future instead of @task. This gives you a way to track the progress of the adaptive sharded job, get success/failure, and to generally combine these algorithms with other @future based code.

Phew, that’s a lot! I’ll keep adding to this list of packages, and I’ll also try to document each of the packages further.

Some of you are using appenginetaskutils at the moment for @task at least. You can keep doing that, but I’ve stopped maintaining that package. It was sucking in too many dependencies, and generally becoming a mighty adhesion lump. I’ve split it into the packages above, and expanded on the offering.

btw, sorry it’s been a while. Going back and breaking up the appenginetaskutils package into many tiny pieces was a decent amount of work, and you know, working in a startup can get busy ;-) Thanks for your patience.

--

--

Greyboi
The Infinite Machine

I make things out of bits. Great and terrible things, tiny bits.