Moving to Google Cloud
Deploying scalable Go APIs on Google Cloud
Since I’ve been doing a number of projects in Go and no longer have a dedicated army of wonderful devops people to support me, it seemed like an obvious choice to launch my applications on Google Cloud using App Engine. I’ve finally gotten to a happy place with my biggest project, but I hit a few snags along the way. This post is intended to share some of the good, bad, and ugly of the experience in hopes in helps someone else facing the same challenges.
To Google’s credit, their interface is nice, clean, and relatively user friendly (much more so, in my opinion than AWS.) Setting up my projects was pretty effortless and uploading my first microservices wasn’t a big challenge. I did it mostly by trial-and-error, but it was still easy.
The Proprietary Package and Other Stories
While I said that Google seems like the obvious place to host a Go app (they invented the language after all), I was a bit surprised at how sweeping their implementation changes are over core Go libraries. For example, farewell to log, I loved thee well. Google has instead implemented its own log library that sits in the same namespace, but which implements an entirely different interface. Goodbye Printf, hello Infof, Errorf, etc. Even if they’d used the same names, the signatures would be different since everything in App Engine requires contexts, but the change seems gratuitous all the same.
In most cases the price is worth paying — once you’ve switched over to the App Engine logs tool the interface is really nice — but it does mean that if you think you are going to successfully abstract out all the proprietary cruft and keep your hosting options open, you are mistaken. Google has wholeheartedly embraced vendor lock-in through proprietary packages.
Another extreme (and seemingly heavy-handed since I haven’t found an explanation for why they did this) example is APIs. If you want to hit an external API, you need to proxy through a Google API and use their proprietary library urlfetch. A lot of vendors have built tools to support this, but there are notable exceptions. For example, I haven’t found a Go oAth1 library that allows custom http clients, so my Twitter bot did not survive the migration.
The Development Environment
Fortunately, mine was a greenfield project, so adapting to these package eccentricities wasn’t too painful. One side effect, however, is that you must run Google’s development environment locally in order for the proprietary packages to work. This seems to be an area of pretty active development, since during the four weeks of building my MVP, a major version change happened with some breaking changes. For example, in the old version if you changed a file, the running service would automatically rebuild and update like a CI tool. In the new one, you need to shutdown and restart the development environment. In many cases, the new option is more stable, but it sure took some getting used to and some detective work to figure out what was happening when I changed over. The development environment can also make it tricky to run multiple microservices at the same time — this can make functional testing of any flow requiring more than one microservice challenging.
So after this, my app was up. And like all API developers, I am completely sure that it takes only my alpha release to bring on the galloping hoard. I need to be ready for load! I need to scale! Fortunately, that’s what App Engine is built for, right? And Go, too?
So I threw together a simple load tester with a configurable number of concurrent connections (I guess I could have used an off-the-shelf one, but since it took literally less than 100 lines of Go code, I figured I’d write my own.) I was dismayed to find that even at a relatively small number of concurrents (eight, in my case) the API starting throwing random 500s. This was odd — I’d configured autoscaling and my database wasn’t even breaking a sweat. What could be happening?
The one took a good two weeks to solve and it was the only time I came close to abandoning the project. First off, Go developers (I decline to use the word Gopher — it sounds pejorative in the extreme) who do not read The Ultimate Guide to Building Database-Driven Apps with Go are cheating themselves out of the most useful 40 pages on the subject ever written. I’m still relatively new to the language and it showed me some pitfalls about how database connection pools are managed that were causing me a lot of headaches. I then found a little Google FAQ article that mentioned, as a sort of afterthought, that “App Engine instance[s] … cannot have more than 12 concurrent connections.”
Twelve really isn’t a lot. Especially given that Go’s sql package can hold open two connections for a single transaction. So that means just six effective connections per…instance? What’s an instance? I’d gotten this idea that App Engine was a perfect cloud thing — like AWS Lambda nevermore would I worry about physical or even virtual boxes. Not so much. It turns out these boxes are still very much alive and well. Some kind folks on the Google Cloud forums pointed me to this article which somehow I missed at signup. The article tells you how instances fit into the grand cloudiness of App Engine and, with a relatively small amount of upfront work in app.yaml, can be fairly relegated again to darkness. In my case, it was a matter of limiting each instance to 5 concurrent requests. Once I did that, I slowly tweaked up my load tester and went past 400 concurrent requests (I stopped testing after 400) without an error. That was pretty awesome. It was also melting my laptop and even my gig home Internet connection.
I was now pretty committed to using Google Cloud for my production work. The time had come to make it look more pro, and, in particular, to use my own domain instead of appspot.com. I bought myself an SSL cert from name.com, my usually provider of things DNS, and tried to upload it. However, there’s nowhere to upload an intermediate cert, so I still get browser warnings if I try to navigate to any of my endpoints. I’m building an API, so this isn’t super-serious, but it is a little annoying and I’m not sure why it should be.
I decided to take Google’s own domain service for a test drive during this process to see if it handled things better. Unfortunately, it was still pretty basic so it didn’t seem worth it to switch. I’m back to using appspot.com URLs for now, but I still have hopes for the future.
So now I have an app up, running, and quite scalable. My appetite for More Cosmic Power was undiminished however. In order to scale much further, I’ll need to be able to ramp up my database. I am currently using Cloud SQL (MySQL in cloudy clothing) but I’m very intrigued by Cloud Spanner since the app I’m planning could have tables with hundreds of millions of records (and even more rows since my tables are immutable) if things go according to plan. Sadly, I’ll need to wait since Go libraries for Cloud Spanner are still in alpha and that’s a little too raw even for me.
Proving, however, that Google is spying on us at all times (albeit mostly benevolently), it was at this time that I got an email from a Google Cloud rep. At first, I assumed this was merely sales, but after she offered a phone call and we had a brief conversation, I realized I now had what AWS never offered me, a genuine resource for questions and help. I don’t need to be belligerent on Twitter (though I’d be an edge case if I wasn’t). I don’t need to keep tagging the amazing miles ward on everything in hopes he’ll have pity and bail me out.
Net Net of the Net
It was a bit of a process getting here and I’ve run into some more gotchas (like my database running out of storage without throwing an alert), but on balance Google Cloud has become a pretty great product. The UI is relatively intuitive, the capabilities are rich, and it seems quite robust. My biggest criticism is that the on ramp is very steep and while the detailed documentation is very good, there’s a “missing manual” in the form of a quality quick start guide. They might take a proverbial page from The Ultimate Guide mentioned above — it’s probably about the right length and depth. If there’s such a thing out there, I hope someone will send it to me. I might kick myself for missing it the first time, but it would be great to know if I’m still missing anything now.