Google App Engine as The Infinite Machine

Greyboi

Published in

The Infinite Machine

9 min readFeb 9, 2017

When we write application software for a single computer, we use a model more or less like this:

A machine runs multiple processes.

A process includes shared code, running in one or more threads, coordinating via shared memory & process level synchronisation primitives.

Multiple processes run at the same time. These can have different code, and don’t share memory. They can communicate through persistent storage and OS level synchronisation primitives (message passing, many others).

Writing for single (possibly multi processor) machines is relatively simple, has good language support, and is what we mostly think of when we think of programming. But sometimes you need to go bigger than what one machine can give you.

The next level out is a distributed architecture, using multiple machines. This gives you something you can’t get with a single machine; theoretically infinite scaling (networking issues notwithstanding, thanks network engineers!).

But there is a huge drawback; coordinating these machines is total crazytown. You have to care about things like node (machine) failures, unreliable networks, unreliable everything really, an incomplete view of the system from any point, no reliable synchronisation mechanisms, etc.

Building software for a distributed computing environment is not like writing a single machine application; it’s more like writing many separate applications, and then managing the complex operational environment involved in getting that multi program, multi machine system to actually run and keep running.

One way to get above this difficult infrastructure level is by using cloud infrastructure services from Amazon, Google, Microsoft, etc. But you tend to be still worrying about machine images or at least containers, scaling up and down, load balancing, blah blah blah.

What would be ideal would be if we could write software for a distributed system, which has all the power of infinite scaling, but feels like writing for a single machine.

Enter Google App Engine…

Google App Engine

App Engine provides a higher level platform, hiding a lot of the evil details of a distributed architecture.

The App Engine model is something like this (simplified):

The top level construct is an App. This is:

a collection of Services which are a collection of Versions
— Each Version has a codebase
— Each Version has a collection of running instances (roughly machines)
a shared collection of Task Queues
a shared Datastore
a shared Memcache
a bunch of other services

This, but also put “code” and “instances” in each of the version boxes

Writing in Python, I can build multiple codebases and deploy them to an App.

If you’ve used App Engine a bit, though, you’ll know that it still feels like a clunky distributed system, not like writing code for a local machine. You have to care too much about distributed system concerns. Code communicates in clunky ways (eg: serialising data, posting it to the task queue, special web handlers processing POSTs for those queues, deserialising the context, …)

I contend that App Engine can look higher level than this; it can be much more like an infinite version of your desktop machine. But you’ve got to squint a bit.

The Squinty Local Machine Model

In this model, I’m presenting a set of concepts. These are metaphors, mappings between a feature from local machines, and a practical analogue of that feature in App Engine’s distributed system.

This model is a little confusing, because App Engine is made of single machines, and all these single machine features are present in App Engine. For example, App Engine has threads on single machines. All I can recommend is to be aware of this, and try not to confuse the single machine feature as a metaphor with the same literal feature on literal single machines on App Engine.

Concept 1: A single machine’s process is an App Engine App.

A process on a single machine is a shared space, existing over time, primarily the pairing of program counters and memory. It is responsible for managing the running program, and the memory available to that program.

The App level for App Engine is the same. It’s a space of running code, with shared services available to all that code (including the datastore and the memcache) which that running code works with, coordinates, and is responsible for.

Concept 2: A single machine’s thread is an App Engine Task.

A thread running inside a process on a single machine can access everything in that process. Multiple threads running at the same time can cause violence to each other by blithely changing shared memory used by other threads, so synchronisation primitives are necessary; you need concepts like critical sections and locks (and semaphores).

Multiple App Engine Tasks running in a single App can access everything in the App. Similarly to threads in a single machine process, Tasks can stomp all over each others’ workspace (datastore, memory) if they don’t coordinate correctly.

Tasks are not quite threads, however. We need to keep these differences in mind:

A thread can run indefinitely, bounded by the lifetime of the process, whereas a Task is strictly limited to a finite window (10 minutes in most cases).
A thread will run exactly once. A Task will usually run once, but may also run zero times, or more than once.

Concept 3: A single machine process’s memory is an App Engine App’s Datastore

On a single machine, threads come and go inside the lifetime of a process.

During that lifetime, memory is allocated to the process (which may come and go behind the scenes due to OS skulduggery, but as far as the process is concerned it’s fixed).

Threads can read from and write to the shared memory. If Thread A starts, writes to memory and exits, then later Thread B starts and reads the same address in memory, it’ll see the change that Thread A made. Memory looked at within the lifespan of a process is persistent.

If we view an App as analogous to a machine’s process, with the lifespan of the App running from when it is first created, to when it is ultimately deleted, then any persistent store is a candidate for being a functional analogue of shared memory in a single process.

However, those services need some extra properties. They will need to be persistent (which rules out the memcache). They will need to be able to be as strongly consistent and reliable as memory is inside a process, which rules out basically anything I can think of except the Datastore.

The Datastore isn’t strongly consistent by default; it’s eventually consistent, and that’s important for scalability. But unlike any other scalable data store I can think of, the Datastore supports transactions, which allow you to use it in a strongly consistent fashion when you really need to. On the other hand, technologies like relational databases can provide strong consistency, but don’t scale.

Concept 4: A single machine’s process level synchronisation primitives are Datastore Transactions

In a single process, for threads to coordinate, you need synchronisation primitives. These tend to be made out of shared memory and test-and-set instructions iirc, but are surfaced at a high level as mutexes, semaphores, critical sections and the like.

It gets pretty squinty here, but the only thing we’ve got available in App Engine is Datastore Transactions. These are a lot like a critical section; we can combine several datastore operations into an atomic unit, and have them successfully applied (commit) or not (rollback); no partial results.

It’s not the same as a critical section though. An in-process critical section crucially includes a thread blocking mechanism; a queue that blocked threads go onto, to be woken up later when they get their turn at the critical section. Transactions, on the other hand, run on top of each other, but will fail and retry at commit time if the underlying consistent view of the bit of the datastore that they are touching has changed since they started. Combined with finite retries, and finite Task run times (see Tasks above), we lose the guarantee that any particular critical section will ever “execute” (ie: commit), and we also lose the defense against starvation that a blocking & queueing mechanism provides.

While there are drawbacks, there is enough power in datastore transactions to use them as synchronisation primitives. You just have to think really carefully about them!

App Engine as The Infinite Machine

If we apply the Squinty Local Machine Model above, we can begin to get a programming model that is close to local machine development, but which horizontally scales to practical infinity (where practical infinity is bounded by the size of your wallet). It’s as if you had a machine which had an infinite number of processors (with a magic architecture where you can actually do that). Your process could just keep kicking off more and more threads, each thread getting its own processor.

It’s still going to be clunky unless you have some high level language support for actually programming this machine.

In the series of articles I’m writing for this publication, I’ll be presenting constructs for the App Engine Python Standard Runtime, to really make this programming model possible and pleasing to use. I’m hoping to write on tasks like the following:

Serialising Python Functions, including closures. Because you should be able to do this:

#SomethingPart2 runs in a separate Task, 
#but has access to the variables in lexical scopedef SomethingPart1(ArgA, ArgB):  @task
  def SomethingPart2():
    if ArgA > 3 and ArgB == "X":
      .. do some things  .. do some things
  SomethingPart2()

(note: this series of articles is done, start here: https://medium.com/the-infinite-machine/task-a-replacement-for-defer-5a65b766d2f#.tietbzt84)

Garbage Collection in the Squinty Local Machine Model: if we’re going to use the Datastore as an analogue of memory, then it’d be nice to not have to worry about deallocation problems, and just let a garbage collector do it.
Distributed Futures: You’ve seen futures in many programming languages; for example here is how they are used in ndb (in the context of a single process on a single machine):

class MyRequestHandler(webapp2.RequestHandler):
    def get(self):
        acct = Account.get_by_id(users.get_current_user().user_id())
        acct.view_counter += 1
        future = acct.put_async()

        # ...read something else from Datastore...

        self.response.out.write('Content of the page')
        future.get_result()

But what if we could do this more generally and across distributed machines? We should be able to do something like this (note that the @distributedfuture functions are run in Tasks, and we get an object that we can ask for a result at a later date):

def Something(ArgA, ArgB):  @distributedfuture
  def SomethingPart1():
    if ArgB == "X":
      .. do something  @distributedfuture
  def SomethingPart2():
    if ArgA > 3:
      .. do something  futures = [SomethingPart1(), SomethingPart2()]  return sum([future.get_result() for future in futures])

(note: start here https://medium.com/the-infinite-machine/welcome-to-the-future-3ca4fb5a4656)

Sharded Mapping: Say I’ve got an unknown number of objects in my datastore (n, could be really large), and I want to visit them all. I could do it concurrently using a recursively sharded algorithm, invoking it like this:

class Account(ndb.model.Model):
  balance = ndb.model.IntegerProperty()myquery = Account.query(... some criteria...)def incrementbalance(amount):
  def doincrementbalance(object):
    object.balance += amount
    object.put()
  return doincrementbalanceShardedMap(myquery, incrementbalance(5))

(note: this series of articles is done, start here: https://medium.com/the-infinite-machine/dude-i-sharded-part-1-a5f9bb035e9e)

Sharded Mapping Futures: Say I want to visit all those objects, but I want to get some kind of result:

def getbalance(object):
  return object.balancedef sumbalances(balances):
  return sum(balances)future = ShardedMapFuture(myquery, getbalance, sumbalances)
future.put()... later, possibly in another taskfuture = futurekey.get()
try:
  balance = future.get_result()
  ...
except:
  ... something went wrong

(note: code exists in the Infinite Machine packages, start here: https://medium.com/the-infinite-machine/new-infinite-machine-libraries-for-python-app-engine-2e6d180a14ee)

And more.

I hope you can see that the Squinty Local Machine Model can potentially lead to an elegant programming experience in a distributed environment. Thanks for reading, and I’ll publish some meaty, more detailed stuff soon.