Improve App Engine Startup Times through Warmup Requests

Season of Scale

Season of Scale

“Season of Scale” is a blog and video series to help enterprises and developers build scale and resilience into your design patterns. In this series we plan on walking you through some patterns and practices for creating apps that are resilient and scalable, two essential goals of many modern architecture exercises.

In Season 2, we’re covering how to optimize your applications to improve instance startup time! If you haven’t seen Season 1, check it out here.

  1. How to improve Compute Engine startup times
  2. How to improve App Engine startup times (this article)
  3. How to improve Cloud Run startup times

When it comes to gaming, whether its action, RPG, or simulation, staying in the action means being online. But in the offline world, you rep your fandom with merch. Will Critter Junction’s e-commerce shop hosted on App Engine be able to handle their hype? Read on.

Check out the video


In the last article, we helped Critter Junction investigate their Compute Engine instances to identify whether latency was pinpointed to request, provision, or boot times. We also helped them utilize custom images to reduce boot times. Now their eyes are on their App Engine instances. Without any in-person conventions this year, Critter Junction players from around the world have begun to flood their site to purchase character cards, apparel, and other swag.

Critter Junction has been testing App Engine Standard to run their new merchandise shop because App Engine autoscales their application across multiple instances to meet the demands of additional traffic.

Pending latency

During load testing, they used Cloud Trace to measure response latency, and noticed a higher than usual latency when they used heavier concurrent requests to their service’s HTTP endpoint.

Pending latency is how long a request can be sitting in the queue before App Engine decides to spin up another instance. If all of your app instances are busy when a request arrives, the request will wait in a queue to be handled by the next available instance. As the load increases, this means requests are processed more slowly.

Pending latency

App Engine will then start a new instance to help based on limits you set like CPU utilization, throughput, and max concurrent requests of the current running instances.

But what they didn’t know is App Engine also needs to load their app’s code into a fresh instance when:

  • They redeploy a new version of their app
  • And when maintenance and repairs of underlying infrastructure or physical hardware occur.

Though cold starts on App Engine are rare, the first request, or loading request sent to a new instance can take longer to be processed because the instance first has to load your app’s code, including any libraries and resources needed to handle the request. This means a gradual increase in response times to handle new traffic.

Warmup Requests

What you want instead is to get the initialization to happen sooner before a new instance serves live traffic.

You can do this by issuing a warmup request, which loads application code into an instance ahead of time, before any live requests reach it.

Warmup request

App Engine attempts to detect when your app needs a new instance and initiates a warmup request to initialize it. Any new instances accept requests after they finish loading your apps code. New requests can then be handled faster.

How it works

Warmup requests are used by the App Engine scheduler, which controls autoscaling of instances based on your configuration.

App Engine issues GET requests to:


You can implement handlers for this request in your code to perform application-specific tasks like pre-caching app data.

  1. For most supported languages, just add the warmup element under the inbound_services directive in your app.yaml file.
  2. Then create a handler that will process requests that are sent to /_ah/warmup. Your handler should perform any warmup logic that is needed by your app.


Let’s walk through a Go example.

  1. Our function performs the required set up steps for the application to function.
  2. It logs when an App Engine warmup request is used to create the new instance. These warmup steps happen in setup for consistency with cold start instances.
  3. Everything else remains the same. This setup function executes the per-instance one-time warmup and initialization actions.
  4. Finally, the indexHandler responds to requests with our greeting.
// Sample warmup demonstrates usage of the /_ah/warmup handler.
package main

import (


var startupTime time.Time
var client *storage.Client

func main() {
// Perform required setup steps for the application to function.
// This assumes any returned error requires a new instance to be created.
if err := setup(context.Background()); err != nil {
log.Fatalf("setup: %v", err)

// Log when an appengine warmup request is used to create the new instance.
// Warmup steps are taken in setup for consistency with "cold start" instances.
http.HandleFunc("/_ah/warmup", func(w http.ResponseWriter, r *http.Request) {
log.Println("warmup done")
http.HandleFunc("/", indexHandler)

port := os.Getenv("PORT")
if port == "" {
port = "8080"
log.Printf("Defaulting to port %s", port)

log.Printf("Listening on port %s", port)
if err := http.ListenAndServe(":"+port, nil); err != nil {

// setup executes per-instance one-time warmup and initialization actions.
func setup(ctx context.Context) error {
// Store the startup time of the server.
startupTime = time.Now()

// Initialize a Google Cloud Storage client.
var err error
if client, err = storage.NewClient(ctx); err != nil {
return err

return nil

// indexHandler responds to requests with our greeting.
func indexHandler(w http.ResponseWriter, r *http.Request) {
if r.URL.Path != "/" {
http.NotFound(w, r)
uptime := time.Since(startupTime).Seconds()
fmt.Fprintf(w, "Hello, World! Uptime: %.2fs\n", uptime)

One thing to note is warmup requests don’t work in all cases, despite the best effort attempt to send requests to already warmed-up instances, when enabled. So you might still face loading requests, like if the instance is the first one being started up or if there’s a steep ramp-up in traffic. In these cases, you should use resident instances, which you can learn about below.

Stickers for everyone!

Once warm up requests were implemented, Critter Junction was able to reduce cold start instances during increases in traffic to their online shop once convention season was underway.

Check out the documentation for language-specific steps on warmup requests. Stay tuned for what’s next for Critter Junction.

And remember, always be architecting.

Next steps and references:



Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Stephanie Wong

Stephanie Wong


Google Cloud Developer Advocate and producer of awesome online content. Creator of the series, GCP Networking End-to-End; host of Google’s Next onAir. @swongful