An Old Person’s Guide to Writing Async Jobs with Google Cloud Platform (+Slack!)
It’s pretty clear to me that what the world needs right now, this very minute, is a middle-aged moron telling folks how to build microservices in 2017, but like I tell my kids, if you’re too stupid to learn from your own mistakes, maybe you can learn from someone else’s.
I’m doing a new thing, and part of the doing a new thing is figuring out how the hell you write a web thing these days. It’s been forever since I made a real webapp from scratch — shit, I haven’t really written code in any meaningful or serious way in a lump of years.
This post is about 3 steps in to the process of setting up a shiny brand new Google Cloud App, Here, I’ll cover how to use the platform tools in GCP to easily run jobs that are related to a user action but don’t need to happen until after the user is done with their web request. You can run any number of jobs and keep your services clean, decoupled, and performant. In the implementation pattern, we’ll fire an event from our codebase, and use other components of GCP to fan these events out to small, freestanding functions.
Read on, and, you’ll also know to implement event-based jobs on Google Cloud Platform with 1 line of code and 1 configuration step to start sending events, and 1 configuration step and 1 Javascript function for every task you want to do asynchronously. You’ll also get a taste of the pleasant and functional GCP UI, and I’ll explain how the Google App Engine Standard Environment is breaking Go. There’s also some code.
The Goal: Dopamine Kicks
A handful of people sign up for Taxat every day. This is positive feedback from the outside world, so it’s nice to amplify it. Compulsively running SELECT *s on the user table gets a little tired after you do it a hundred times a day or so we want a better way to know who’s signing up and when.
We’re in a Slack together (of course), so the easiest, most passive way for us to monitor activity is to post to a Slack channel. Talking to Slack is a pretty straightforward thing to do, and we get a surprising amount of insight by just looking at the timing of people’s actions and their location, which is also passed through to the Slack message.
The Architecture: Microservices on Google Cloud Platform
Microservice-based, shared-nothing architecture aligns with our overarching technical goals, namely:
- Low operational overhead
- The ability to roll short-term collaborators on and off specific projects easily
- Performance
- Resilience
- API-First/Mobile App Ready
Up until a few days ago, we only had 2 services. One to serve up web assets, and another to provide a REST API and persist data to MySQL. These are both Google App Engine apps, running in the Standard Environment, written in Go. There isn’t really much in the API yet, so there hasn’t been any reason to decompose this further.
Posting to Slack was implemented with an intermediary Publish-and-Subscribe layer and an event listener written as a Google Cloud Function. Each actor in the system has one function, there’s no hard coupling between them, and the pattern can scale to support many events and job runners without negatively impacting response times for end users.
Pub/Sub
Pub/Sub is Google Cloud Platform’s message dispatcher component. When someone signs up for Taxat, or updates their signup info, the api service sends a message to Pub/Sub, and then continues going about its business as fast it can. Interested parties sign up for messages from Pub/Sub, and it broadcasts the messages to all of the subscribed listeners.
Setting Up An Event
Topic is the key metaphor in Google PubSub. A topic is a channel over which data objects get sent. The data objects themselves contain a string defining the event’s message and and optional set of key-value string attribute pairs. For example, In this instance, there’s a channel called subscriber that’s used to carry messages about people signing up. The message contains a verb describing what actually happened, and the attributes carry information about the user. When a publisher sends a message, it sends it to a Topic, and that’s where listeners listen for it.
Topics can be managed programmatically, but this is expensive and not typically needed inline. Unsurprisingly, setting up topics comprises roughly half of the Pub/Sub control panel in the GCP dashboard.
In the background, you can see the names of the project’s existing Topics, and the use of an [env]. naming convention to set up separate topics for dev and prod environments. The GAE service (api from the diagram above) that originates the events has configuration to direct events to a specific Topic based on the deploy environment.
The screenshots should provide a good sense of the GCP UI if you haven’t used it. In general, the views are clear and lightweight, and they use Material design, as you’d expect. The biggest single factor in my choosing GCP over AWS was the relative clarity of the UI and documentation. AWS was a real mess, especially for a new user.
Unfortunately, there isn’t any message log on the publish side of Pub/Sub, so you can’t verify that what you think is getting sent is really what’s getting sent. There’s a Pubsub Emulator that supposedly runs on your local machine, but I lost interest in setting it up after a couple of frustrating hours. Without a working emulator you need at least a trivial Subscriber working to see what’s going on; there’s a live log viewer in Google Cloud Functions (on in GAE or Compute Engine if you’re using one of those to handle events). If I was starting again, I’d have written the Subscriber first, which would have saved me a little bit of grief in testing.
Publishing Events from Go
Firing events from a GAE app in Go is pretty straightforward, and that’s likely the case in other languages as well. There’s some boilerplate plumbing to encapsulate, and there’s one major shortcoming in the GAE Go Standard Environment that stands in the way of fully optimizing performance and writing idiomatic, concurrent Go.
I put the plumbing into a package called publish, which exports one method that doesn’t take any Pubsub types as arguments, and doesn’t return anything. At the point where the thing-to-be-broadcast happens, it’s one line of code.
The event() implementation in publish does 5 things:
- Retrieves a Client for the Pubsub instance
- Gets a handle on the Topic you want to broadcast to
- Sends a message
- Logs on failure (maybe, see below)
- Cleans up
r.Get() gets the status of t.Publish(). It blocks, so its invocation is wrapped in a goroutine so we can try to log if the call fails, without blocking the sender. The normal Go pattern would be to wrap this entire method in a goroutine and completely factor out the slow parts for the caller, and that’s what the public-facing interface, Event(), wanted to do:
Broken Context, Broken Dreams
A goroutine here fails and induces a panic because of the way Google implements and cripples context.Context in the Standard environment, and makes it impossible to run real background tasks, and generally makes the world a slightly shittier place than it is already.
All Google API functions take a Context as their first argument. The Context interface is part of the top-level Go library and it marries 2 intents:
- To provide a mechanism to carry data and state across boundaries that don’t map well to variable scopes. In particular, it’s used to store data relating to a request, so that any actor in the calling chain can have access to request-pertinent info. Requests are handled concurrently, so there’s no scope in which to isolate a particular request.
- To provide a mechanism to terminate forked functions when their parent terminates or when a specific amount of time elapses. In the case of the Request Context, the Context is cancelled as soon as the mainline (ServeHTTP()) finishes, so anything that uses the Request Context is going to have a very hard stop at that point in time.
Asynchronous jobs are used in this area to improve end-user performance of web services by deferring execution of non-critical-path bits until the http server is done talking to its client. The Slack notification case fit perfectly into this category. Fire and forget.
The Go authors considered this scenario, and the standard libraries provide a context.Background() method that returns a Context that lets you execute longer-running code in the background, without having to worry about event cancellation.
The Background Context would be logically appropriate to pass to the Pubsub API, since we want the publish’s priority to be lower than the request’s. Unfortunately, Google has constrained the GCP APIs so that they panic if you pass them a Background Context. The don’t work with anything other than the currently active Request Context and they stop every goroutine hanging off the Context once the mainline finishes its http response. In other words, any code that uses GCP APIs can happen concurrently, but it needs to be hardwired to complete before the Request is done.
When you kick off the publish using a goroutine — the idiomatic and performant way to do it — it almost always won’t happen, because the Request Context will get Cancel()ed before Go gets around to firing the event. You won’t see your error log messages either, because they depend on active Request Context as well.
One of Go’s key strengths is in the elegance with which you can idiomatically manage concurrency and optimize performance. In limiting your execution window to the request lifecycle, Google is throwing a big chunk of what’s great about Go out the window.
Google frames the constraint as a limitation of the Standard Environment, and recommends using the Flexible Environment for this case. Thing is, all the other constraints of the Standard Environment — no binaries, no write access to the file system — all make sense for a straightforward web service. The sandbox, configurability, APIs, build commands and defined entry point for your server code all significantly reduce operational overhead and are generally a pleasure to work with. Moving to the Flexible Environment would be a huge expenditure of immediate and ongoing effort for this case. In every other way, the Standard Environment’s tradeoffs between constraints and management overhead make sense, but not this one.
It’s not unreasonable to place some restrictions on background threads in the Standard Environment. You shouldn’t be doing Bitcoin mining here. But there needs to be some small window for simple computation to speed up the user client response. A slightly deferred Cancel() or a Context that clones the Request Context with an extra second or two before cancellation is all that’s required here.
Go Code Samples
Here’s a couple of Go snippets that might be helpful. The publish package contains helper methods to populate a Context with a pubsub Client at to retrieve it for re-use across functions. The mapify module can convert most any struct into a map[string]string to use as a message attribute set using reflection.
Fielding Events With Google Cloud Functions
If Context handling in Go is my least favorite design choice in GCP, GCF is probably my most favorite.
A Google Cloud Function a stateless, zero-infrastructure chunk of code, written in Node/JS that responds to Pubsub events or HTTP requests. It’s that thing you can go to when you just want to make something on the internet without a lot of bullshit.
You can test and monitor the function from the dashboard, and you can reconfigure it also update the source, but I generally prefer to do these on my dev machine using a shell script.
There’s nominally an emulator to run GCF functions locally. Getting it running requires more understanding of Node package management than I plan to attain in this lifetime.
Node/JS is the only programming language supported by GCF. Once upon a time, at a company whose name begins with Y, I was actually a professional front-end developer, and I authored, if you can call it that, Javascript, on a regular basis. Nowadays, I try to avoid the matter as best I can, but every few years I need to do something with Javascript. Diving in always feels like Sheriff Hopper in Stranger Things digging down under the pumpkin patch and finding evil extradimensional lifeforms infesting the ground beneath his otherwise unremarkable town and you know that all he can possibly be thinking is what the fuck is going on in here?
The post-signups-to-slack cloud function basically needs to do a couple of sprintfs with the event data and POST json to a Slack url. I searched the internets to garner the right way to do an http request from Node, and I started to come across code like this:
I’m generally supportive of transgressive impulses, but the person who got this const into the ECMA standards really needs a hobby. Or maybe they should be President.
Once I got over the shock-and-awe of mutable constants and figured out that my function wasn’t working because the event data is actually in event.data.data and not event.data, passing the event to Slack was quite easy. Code is here, complete with overly clever regexes and constants that aren’t.
Slack
As we lurch towards the finish line, there’s actually little to say about Slack, and that’s OK. It mainly serves here as an example of something you do with an event at the end of a GAE->Pub/Sub->GCF chain. Also, the Slack app-making-place is totally slick and easy and there isn’t really anything to complain about, unfortunately.
Posting messages to a Slack channel is just posting a small JSON object to an endpoint with a very long url. Steps:
- Create a Slack App
- Select “Incoming Webhooks”
- Turn the switch to on
- Add New Webhook to Workspace
- Specify the channel to post to
- Copy the URL and paste into your code
Put it all together, and you get:
I know, it’s a little disappointing, but I guess that’s why I had to write all this. As I’m [re-]discovering there are certain aspects to being a programmer that are quite challenging.
This was part 4, or maybe 3, of the story of how I came to approach the construction of a microservice based architecture tabula rasa. It’s the first one I wrote because it’s the last thing I did, so it’s fresh. Based on my past experience with multi-part series on the blog I’m expecting that I’ll follow this up with maybe a couple of paragraphs in a few months, or maybe I’ll check to see how those 280 word tweets really work.
Acknowledgements: Keyur Govande for listening to me bitch about Contexts and helping me make sense of it. Avleen Vig for encouraging me to write this. It’s his fault.