Python App Engine 2017: Prioritisation of tasks using Services

Greyboi
The Infinite Machine
12 min readFeb 3, 2017

In my last post I went through getting the quickstart helloworld app going with the latest tooling and whatnot for Google Cloud Platform.

On reflection, it was surprisingly difficult, and that’s coming from someone who’s been using App Engine for 7+ years. Google Cloud Platform is big, the gcloud sdk is a whole universe. I guess that’s what you’ve got to expect of a mature platform.

Anyway, look, that beginner level is interesting and I’ll come back to it, but I need to jump ahead because I’m doing this for a reason, and the reason is Services.

App Engine is a misunderstood thing. I think people mostly see it as a weirdly proprietary and restrictive cloud based web server, and Google themselves seem to be pushing it as a useful coordination point for GCP based systems.

I think of it as a high level distributed computing based virtual machine that I can program to, a vast sea of computation that I can use for pennies. A wonder of the world.

One of the things I do is to use task queues extensively, and I’ll be writing a whole lot about that. I send fully serialised python functions, lexical closures included, from task to task, mapping over objects and performing all kinds of weird and wonderful background processing.

One side effect: I have some new algorithms that spit out thousands of tasks in seconds. This is the kind of thing App Engine excels at dealing with, but, you can’t be doing that in your main application. Why? Because if the UI handlers for your front end are handled by the same instances that handle these mountains of tasks, they’ll get lost in the noise, and starve. ie: the UI will become unresponsive.

What I’m trying to solve in this post is a prioritisation problem. How can I have bulk background processing happening in the same App Engine application, without starving my UI?

The approach I’ll try is is to have different sets of instances for the UI and for background tasks. And I think the right abstraction for this is Services.

A modern App Engine app’s architecture looks like this:

This is from the microservices guide.

Think of a version as a codebase + configuration.

You can just have one default service and one default version for an App Engine project, and that’s often enough. But one of the great features of services (well versions ultimately) is that they maintain different sets of instances:

What’s an instance? Think of it as a VM which includes your codebase, some processor cores, some RAM and so on. It’s the machine that actually runs your code.

One thing that Services and Versions let you do is to run different code bases in the same application, which is an excellent thing when you need it. But I’m trying to solve a different problem today, a prioritisation problem.

When you run a task, be it a handler that is run in response to an outside call (like someone hitting your web page / api), or a worker task you’ve kicked off in the background, that task runs on a task queue.

App Engine has Push Queues and Pull Queues. Push Queues are automatically handled by App Engine; they are just assigned to an instance to run, and come into your app as a web request. These are what I’m looking at today. When I talk about tasks and queues, I’m always talking about push queues.

Pull queues are a different beast; tasks are enqueued and you need to provide separate, explicit logic to pull tasks off the queue and handle them. I’m not addressing those in this post.

Also, there are different kinds of scaling for Push Queues; I’m using Automatic Scaling (where App Engine figures out how to scale instances up and down magically).

The prioritisation problem is this:

Say I have 1000 background tasks enqueued, some are running. Then, a user hits my api or web handler. That becomes task 1001 on a queue. If the same set of instances are handling both types of task, and are overloaded, App Engine is madly trying to provision more instances. But this can take a while (minutes, if you’ve really started a lot of tasks). Meanwhile the user’s task is lost in the pile of tasks that are delayed.

If I use two services, a UI service and a Background service, it would look like this…

I have 1000 background tasks enqueued, some are running. All this is happening in the Background service, on instances for that service. App Engine is madly trying to provision more instances for Background. Tasks are delayed. Then, a user hits my api or web handler. That goes to the UI service, which has its own instances (and is relatively quiet at the moment). The UI task is handled by one of those instances, quickly, and unaffected by the chaos happening in the Background service.

That’s what I want.

So, let’s try to get this happening.

I’m going to try to modify the quickstart app, to do the following:

1 —I’ll give it two services; the default service for the UI, and a “background” service for background tasks. I’ll use the same codebase for both services.

2 — I’ll modify the Hello World handler to have two buttons, Foreground and Background. Foreground will kick off, say, 10000 background tasks in the default, UI service, and Background will kick off the same number of tasks in the background service.

3 — I should be able to see the UI be affected by Foreground, by just refreshing the page; it’ll be really unresponsive. Background meanwhile should have no such affect.

Ok.

1 — Make the background service

First, let’s turn the quickstart app into something with two services.

What we’ve got so far looks like this:

One service, with 3 versions interestingly. Apparently the uploads I did in the last article all created new versions. Do I want that? Probably not.

Here are my running instances:

One instance, listed by version. Hmm, I think I want to get this version thing under control.

Ok, so what do I do to make a new service?

So the first thing is, I’ve pecked around in the documentation and I can’t figure that out. I’ve read the opinionated docs about microservices, which are conceptual, but how do I actually do this thing?

I’m guessing I need to do this:

  • Have a separate app.yaml file for my new “background” service
  • Use some kind of switch with gcloud to deploy that service separately to the default service.

So first, let’s get the project set up. I’ll copy the hello_world project from the last post into its own folder called servicesdemo, then add it to Eclipse as a Pydev project. (yell out if you want more detail on how I do this)

Ok. So first, I’ll duplicate app.yaml, and just call it background.yaml .

Ok, now how do we deploy? I’m going to need some sort of switch for uploading a service.

In the last post I deployed like this:

gcloud app deploy --project emlyn-experiments

Maybe there’s some kind of service switch? Looking in gcloud app deploy --help:

oh dear god

Ok, ok, just try to understand this thing.

Firstly, this looks promising:

SYNOPSIS
gcloud app deploy [DEPLOYABLES ...] [--bucket=BUCKET]
[--image-url=IMAGE_URL] [--no-promote] [--no-stop-previous-version]
[--version=VERSION, -v VERSION] [GLOBAL-FLAG ...]
DESCRIPTION
This command is used to deploy both code and configuration to the App
Engine server. As an input it takes one or more DEPLOYABLES that should be
uploaded. A DEPLOYABLE can be a service's .yaml file or a configuration's
.yaml file.

There’s no service switch, but there’s a version switch. I’ll use that and just set my version to “default”, maybe I can calm down the version thing that way.

I faintly recall seeing something about setting a service name in the app.yaml file. Bit of googling, yup, it’s here:

Navigating this documentation is like some kind of treasure hunt :-(

Ok, so let’s add this servicename to background.yaml:

runtime: python27
api_version: 1
threadsafe: true
service: background
handlers:
- url: /.*
script: main.app

Righto. Now the deploy command should be this:

gcloud app deploy background.yaml --project emlyn-experiments --version default

uhhuhuhrrr

ok, let’s try another version name:

gcloud app deploy background.yaml --project emlyn-experiments --version defaultversion

Ok, that might have worked! What can I see in the cloud console?

Oh, that’s totally a thing! Success!

2 — Add the “foreground” and “background” buttons

I need to be able to click a button and run 10000 background tasks. Ok.

The way to do this is to create 2 task queues, which will run their tasks in the “default” service and “background” service, respectively. For this, we need a queue.yaml file. It’ll look like this:

queue:
- name: default
rate: 100/s
- name: background
rate: 100/s
target: ???

We don’t need to define the “default” queue, because App Engine does this for us, but I want the processing rate to be high, so I’ve added it here.

What’s target?

uurhh surely I can figure this out?

Ok, ok. Let’s try some stuff. First, this:

In the logs, I can see the request:

You may have to zooooooooom iiiiiiiin…..

At A, you can see the request url, at B you can see the version is defaultversion, at C you can see the service name is background (called module id because services used to be modules).

So maybe we can just say background, and it’ll refer to the background service? Like this:

queue:
- name: background
target: background
rate: 1/s

Now how do I deploy? I think deploying both the default service and the background service will upload the queue.yaml file. Let’s try deploying the default service:

gcloud app deploy --project emlyn-experiments --version defaultversion 

Ok, let’s go look:

Totally did nothing.

<elided lots of hunting through logs and other horror>

Look, it’s working in the local environment:

Trying to get Queues working at all

I’m experienced in App Engine development, but I’m new to gcloud. Something is wrong here.

Ok, let’s strip it back to fundamentals. First, let’s simplify the queues:

queue:
- name: background
rate: 100/s

There should also always be a queue called default. Let’s actually use that.

Now, I’m going to add code for enqueuing a task, to the helloworld request handler. It should either work, or blowup if the queues aren’t configured.

Here’s the new code for main.py:

import webapp2
from google.appengine.ext import deferred
import logging
def HelloWorld():
logging.info("Hello World")

class MainPage(webapp2.RequestHandler):
def get(self):
# run HelloWorld in a background task
deferred.defer(HelloWorld)

self.response.headers['Content-Type'] = 'text/plain'
self.response.write('Hello, World!')
app = webapp2.WSGIApplication([
('/', MainPage),
], debug=True)

I’ve used the deferred library to run a task. Don’t use the longhand way they show you in the docs, this is the best way to run a task in python.

To make deferred work, you also need to modify app.yaml (and background.yaml, don’t forget to add it there too):

runtime: python27
api_version: 1
threadsafe: true
handlers:
- url: /.*
script: main.app
builtins:
- deferred: on

That’s adding a built-in appengine handler at the route /_ah/queue/deferred to handle deferred requests.

Ok. In main.py, you can see I’ve added a HelloWorld function that just logs Hello World, and I run it in the background (it’ll go to the default task queue, because I haven’t specified anything).

So if queueing is working, I should be able to visit that page, and see the Hello World log entry.

So running the local server, then visiting localhost:8080, I see the hello world webpage, and I see this in the terminal:

See “Hello World” logged? That’s success.

Now I’ll deploy this to the default service, and try it there. Let’s visit the web page:

Ok! What’s in the logs?

Look, it’s run the background task using /_ah/queue/deferred, and you can see the line near the bottom saying “Hello World”. Good. What’s the task page look like now?

Ooh, we have a task list, instead of that annoying documentation screen. Ok! And you can see the default queue, and the task we ran.

But where is the background queue? Something is wrong here.

Googling, googling, googling, … then this:

Let’s try that!!! My command should be

gcloud app deploy app.yaml queue.yaml --project emlyn-experiments --version defaultversion

And…

Ok, let’s change the app.yaml to make the background queue truly run tasks against the background service:

queue:
- name: background
target: background
rate: 100/s

And I’ll change main.py to use the background task queue:

deferred.defer(HelloWorld, _queue = "background")

Now deploy both the default and background services.

gcloud app deploy background.yaml app.yaml queue.yaml --project emlyn-experiments --version defaultversion

Yes, it turns out I can deploy both services and the queue definition in one line. Nice.

And visit http://emlyn-experiments.appspot.com/, I see “Hello, World!”, so let’s look at the console:

There’s an instance running for the background service, good!

How about the log?

You can see module_id is background, and it’s logged Hello World. That’s success!

Now let’s get back to those two buttons.

Here’s a new main.py:

import webapp2
from google.appengine.ext import deferred
import logging
def HelloWorld():
x = 0
for _ in range(100000):
x += 1
logging.info("Hello World")

class MainPage(webapp2.RequestHandler):
def get(self):
# run HelloWorld in a background task on the background queue
deferred.defer(HelloWorld, _queue = "background")
self.response.headers['Content-Type'] = 'text/html'
self.response.write('''
<html>
<body>
<form method="POST">
<button type="submit" name="foreground" value="x">foreground</button>
<button type="submit" name="background" value="x">background</button>
</form>
</body>
'''
)
def post(self):
def EnqueueTasks(aNumTasks, **kwargs):
for _ in range(aNumTasks):
deferred.defer(HelloWorld, **kwargs)

if self.request.get("foreground"):
EnqueueTasks(10000)
self.redirect("/")
elif self.request.get("background"):
EnqueueTasks(10000, _queue = "background")
self.redirect("/")
else:
self.response.write("wat")
app = webapp2.WSGIApplication([
('/', MainPage),
], debug=True)

What’s happening here is that the get handler is presenting some html with two buttons, foreground and background. Then the post handler is checking which button was pressed, and enqueueing 10,000 tasks to either the default or background queues, which are serviced by either the default or background apps, respectively.

Also, the HelloWorld function now loops uselessly, to make it actually use some resources.

I’ve uploaded this final version of the code to github: https://github.com/emlynoregan/servicesdemo

So now I deploy the services, then go to emlyn-experiments.appspot.com :

This is it, this is the experiment.

Ok. I predict that pressing “foreground” will fill the default queue with tasks, kick off a bunch of default service instances, and refreshing the browser will be tough because the default app will be unresponsive.

Here’s a video of pressing the “foreground” button.

Perhaps the audio track is inappropriate? Whoops.

You’ll see I press the button, lots of task build up, then I try to refresh the page. Sometimes it’s ok, but sometimes it’s laggy, slooow.

Try the background button:

Jaunty!

I do the same thing here, and you’ll see that refreshing the page is fast, never any lag. That’s because the default app is doing that work, while the horrible slow tasks are all in the background app. Success!

Thanks for hanging in there for this. I feel like I learned more about gcloud, and now I know how to break my app into multiple services, and use those services to do different kinds of jobs.

Addendum: Running these tests (including screwing it up a few times) cost a bit over $7.

--

--

Greyboi
The Infinite Machine

I make things out of bits. Great and terrible things, tiny bits.