App Engine Resident instances and the startup time problem

I was attending a meetup in Austin,TX last week, where I was approached by some developers that created an app called “ThinkThanks.” Their goal was to revolutionize the way people think about thanking other people. However they were running into a large problem: When a large number of users suddenly wanted to “send thanks” they noticed their Request latency spike through the roof.

Now, I’ve already debugged a few issues related to startup performance, so this sounded pretty similar. However when I sat down with the lead developer I quickly realized that they had a very different usage pattern than Bait-and-Stitch, or Partly Cloudy.

Too busy to read? Watch the TL;DR video above!

Turns out that they had already done all the heavy lifting to reduce instance startup time. Their problem was that the app traffic was extremely bursty. They would go from 0 users for hours, to suddenly thousands. And in those situations, instances ware legitimately being cold-booted, causing a delay in overall latency while that instance was being created.

Spending some time in Cloud Console confirmed this behavior; Looking at latency spikes next to instances, we saw a direct correlation between instance count, and latency spikes:

The latency of ThinkThanks was taking a considerable hit any time a new instance booted up. So, besides just adjusting the scheduler to boot up fewer instances, it looks like they needed something stronger to handle sudden spikes in traffic and instances.

Idle Instances are latency playthings

As I talked about before, The App Engine serving algorithms are constantly deciding on whether it’s better to queue a request or to spin up a new instance. This takes into account a significant number of factors, (such as queue depth, current QPS, avg. request latency etc) to decide when this should happen.

If your cold-boot time is significant, then during extreme spikes in traffic, user request latency will suffer significantly as requests will sit longer in the work queue before being serviced. This could flood your application with a huge amount of startup-costs and delay the entire system significantly.

In these situations, Resident Instances can help reduce overall startup times, improve user latency, and reduce costs all at once.

Resident Instances are instances which stick around regardless of the type of load your app is handling; even when you’ve scaled to zero, these instances will still be alive.

When spikes do occur, resident instance are used to service requests that cannot be serviced in the time it would take to spin up a new instance. As such, requests are routed to them in parallel to a new instance being spun up. Once the new instance is spun up traffic is routed to it and the resident instance goes back to being idle.

The point here is that resident instances are the key to rapid scale and not having your user perception of latency shoot through the roof; They hide any perception of instance startup time from the user (which is a good thing)!

I was able to confirm this in ThinkThanks by setting their min-idle-instances flag to 1, and charting how the latency of my application adapted as a result:

Once the new request load came in, it was handled by the existing instance, so that latency didn’t shoot through the roof while the new instances were booting up.

It’s worth noting that the Idle instance setting has a min and max value, which allows them to be scaled up as well during spikes. The folks at ThinkThanks messed around with those values, to find a sweet spot, which I suspect will vary between application types and regions (YMMV!)

Always-Cron instances

After playing around with resident instances a bit, ThinkThanks noticed that there seemed to be a +1 instance tax involved in situations where traffic was lower than normal. Basically, if no one has been using the application for a while, a sudden burst would cost 2 instances : the Idle instance, as well as the new instance being spun up.

But for infrequent traffic, that wasn’t really what they needed; ThinkThanks was wasting an instance. A more optimal solution came from a G+ post, where +Bogdan Nourescu described his solution to the same problem (paraphrased):

For my use-case the min-idle-instances flag is more than i need, cause if i get many requests in one day, i have 2 instances running, even tho i only need 1.
……
My workaround: Create a cron job that calls on a DoNothing Url, so it keeps my instance active when i want, so when users access the API, they get low latency and no start-up requests.The DoNothing URL just logs “keep instance alive”

Seems like a pretty good solution. Our startup friends changed their configs around, and saw that they had pretty much the same latency results for load spikes, but didn’t see the same instance tax:

Learnings & takeaways

Happy with their latency improvement, the ThinkThanks folks finished their meetup-provided Pizza, and went happily along their way.

Afterwards, I talked to a few App Engine experts, who wanted to make it very clear that there’s two very distinct and separate use cases here.

If you have a high-frequency, lots-of-instance-churn scenario, and a slow cold-boot time, then resident instances are the best bet. Yes, you end up with more total instances, but letting GAE handle the scaling up-and-down of resident instances will respond more gracefully to shifts in traffic load that you have over time.

If you handle low-frequency traffic, with occasional bursts, and a slow cold-boot time, then keeping an always-cron instance around is ideal since you can handle the initial traffic load gracefully, with a lower number of total instances.

The downside to both of these situations is that these persistent instances come at a cost: you’re basically paying for an instance to be alive, 24x7, which could add significant costs to your operating budget.

So as always, make sure you carefully weigh the pros-and-cons of these approaches before diving into one head-long.

HEY!

Want to know more about how to profile App Engine startup time?
Want to know more about GAE’s scheduler settings?
Want to know ways to avoid starting new instances in the first place?
Want to become a data compression expert?