Google AppEngine Benchmark

One of the core offerings we have at GrowthOps is our Beyond Agile methodology for software delivery, which is heavily supported by different managed platforms in the cloud. Our preferred option is Google AppEngine standard configuration, and the reasons are explained here.

As you may already know, AppEngine (GAE) supports different runtime configurations. This benchmark is an attempt to understand the behaviour of AppEngine with various runtimes/libraries, with the intention of providing another input when deciding your AppEngine stack¹.

For this, I will calculate the number of instances spawned by GAE given a certain number of requests that run a common² task. I will also capture the memory used and the average instance loading time.


I have included some well known frameworks for the different languages and also some open source libraries that we have developed internally. For each framework I’ve used its default stack for GAE. The configurations are³:

Each runtime was deployed as an AppEngine service with F2 instances (256 MB of memory, 1.2 GHZ) in US/Central region. You can see the code and configuration on my GitHub project.


Each runtime was benchmarked with 4, 40 and then 400 requests per second⁴ during a 10~12 minute period.

Each request will run a Datastore query which includes a different filter each time and limits the result to 10 entities. The average size of each entity is less than 200 bytes, meaning that the memory required to load each query is less than 2 KB.

The queries and the data are the same for all of the environments.

The results are captured in the AppEngine dashboard and taken approximately halfway through the test.

Created instances⁵

Without stating the obvious, for this test we can say that up to 40 requests per second, all environments have very similar behaviour.

This means that, up to 40 req/sec and given the same conditions⁶, the increase of cost (due to instance time) associated to the chosen runtime might not be significant compared to other costs that you could incur in GCP such as Datastore IO.

During the maximum load however (400 req/seq), we can see that Node-Nest-Graphql spikes up to 35 billed instances, being the most expensive. Java-Spring follows and then Node-express and Java-thundr use practically the same number. Go, as expected, is the one consuming the least amount of instances — 11 created instances during the test.

In this case, using NodeJs-Nest-Graphql might represent a significant cost increase when operating your project while Golang could reduce the costs related to instance time significantly. The other stacks are relatively the same.

Memory used (MB)

Here things get a bit more interesting than number of instances.

First, Golang is by far the most memory efficient of all. If your application is memory intensive, Golang could reduce the operating costs. Remember that the number of instances spawned will be affected by the memory required as well.

Then, if you look at Node-Nest-Graphql, it uses less memory than both Java configurations at 4 req/seq but at 400 req/seq its memory usage is the highest amongst all. It is still worth investigating if this is due to the GraphQL implementation we are using or the Nest framework — a topic for another article 😊. Nevertheless, it’s worth considering this behaviour when choosing the stack.

Average instance loading time in seconds (95 / 50) percentile

For this, it’s best to focus on the results of 400 req/sec since there are more samples (instances). For clarity, this metric shows the loading time of the instances created. The first number indicates that 95% of the instances will load under that time, while the second number indicates the same for 50% of the instances.

Java-Spring is the configuration which takes longer to load. If your application load comes mostly from user interaction, this might be a factor to consider.

Again, Golang shows impressive results here.


Instance time is one of the metrics that google uses to calculate the billing of your AppEngine application. Optimising it involves several mechanisms, but definitely the tech stack you choose has a big impact on this value, more visible at high load.

At low load, despite the number of created instances being the same, the memory consumption varies significantly. It is worth noticing that memory intensive applications could demonstrate a different behaviour to the one exposed here, mostly when handling few transactions per second.

Choosing a tech stack has a lot more implications than just performance and instance time, however I hope this benchmark can be used as input during your evaluation.

Thank you very much for reading and please 👏 if you think this might be interesting to someone else.

[1]: Choosing a tech stack and a development language has a lot more implications than just the instance time and performance. This is just one variable to consider.

[2]: Querying and filtering is one of the features of Datastore. Also it is recommended to keep a small entity size. The benchmark code can be extended to include other operations in the future.

[3]: I left Php and Python out of the benchmark but could be easily included. You are welcome to issue a pull request to the repository with the changes. 😊

[4]: The number of request per second is average. It has a 10% error variation due to measurement and network latencies.

[5]: The GAE dashboard shows created and billed instances. I decided to use created since billed instances trend to equal created instances over time.

[6]: Conditions are: F2 instances and requests that query a small number of entities (10) which have a very small size.