Rapid Iterative performance optimization of service hosted on AKS

Munish Malhotra
Microsoft Azure
Published in
10 min readJul 22, 2021

Authors: Lakshay Kaushik, Munish Malhotra

Motivation:

The motivation behind Rapid Iterative Performance Optimization of services deployed on AKS was to create such a setup that would enable all developers in the team to quickly gauge the impact of their code change on performance and quickly surface out performance issues with the application and infrastructure. We wanted to bring the performance testing as left as possible in the application development workflow, enabling us to do rapid iterative optimization of the code under development. We didn’t want a complete performance testing phase to surface out the performance issues at a much later stage of development.

Strategy:

As the goal of this exercise was to do a scaled-down performance testing and benchmarking of the application, the high-level strategy included the following scenarios:

  1. Test the application on a single pod with limited resources to make sure the DB doesn’t become the bottleneck during this test. Iterative load running will help you iron out unnecessary loops, routines, optimizing connection with downstream, serializing and deserializing operations, compressing and uncompressing operations, etc. This iterative execution continues until we are happy with the individual latency of each component of the code.
  2. Iterative test of application on a single pod by gradually increasing resources of the pod. This will prove that increasing resource of the pod increases throughput until the application no longer can utilize the added resources. There may be several reasons why an application won’t be able to utilize all resources given to it due to concurrency issues, multi-threading issues, etc. Iterative execution of load tests by a gradual increase of pod size will help us surface these kinds of problems with our code. This will also give us the max size of the pod which we can have in production, as any resource added to the pod after this won't be utilized by the application.
  3. Iterative execution of load tests to determine the best indexes and optimal document structure on MongoDB.
  4. Reducing pod resources and adding more pods to horizontally scale the service. This would prove that upscaling and downscaling the application is smooth and doesn’t introduce spikes till the pods are warm.
  5. Increase pods to the limit where mongo becomes the bottleneck and then increase resources of mongo to see if it linearly scales without introducing further latency.

These scenarios not only helped us iron out issues quickly with our application and MongoDB but also help us gain confidence in the scaling capabilities of service deployed on AKS and vertical scaling of our database.

Rapid Testing Environment:

For rapid iterative performance testing following were our requirements:

  1. We wanted quick feedback on the load test without logging into multiple machines.
  2. We wanted to execute load tests on-demand and as frequently we needed.
  3. Load Tests variables like duration of load, threads, ramp-up time, ramp down, etc should be configurable to run a variety of load tests.

We finalized the following setup:

1. Selection of Load Test Tool: Jmeter was selected as it was a popular open-source tool, battle-tested, easy to use, and could be easily configured by running on azure pipelines.

2. Azure Pipelines which could run load tests: Pipelines made the execution of load tests repeatable and quick, also every member of the development team can run it to get instant feedback on the deployed code.

By keeping load tests in a pipeline, it was also possible to keep a log of historical runs which could be referenced if needed.

Sample Pipeline for running Jmeter load-

Below variables were parameterized for every load run in the azure pipeline.

3. Extension to view Report of Load Test: Azure DevOps Extension to render HTML reports was used to quickly get results of load run within the azure DevOps results panel of the pipeline run.

You can also download this extension from this Link.

Observability:

The azure monitor was used to monitor the resource consumption for the AKS resources and Mongo DB observability was done using Mongo Atlas dashboards.

Later in the project, the metrics from Azure AKS and Mongo DB were published on a single dashboard using Grafana; for this blog, we won’t go into the details on how to set up and use the Grafana dashboard for observability.

Infrastructure:

We were running our application on Azure AKS but the same strategy can work on the application running on any Kubernetes, though you need to have correct observability in place.

We captured the trend in CPU utilization of Mongo DB, CPU and memory utilization of pods, running pods, response time percentile of the API’s, etc. to give insights on any performance bottleneck of the service, this should also provide any potential issue with the load test infrastructure, or the application scaling issues.

Load testing Infrastructure comprised of JMeter running on a private Azure pipeline agent with 2 CPUs and 16GB of memory.

Execution of Strategy:

We’ll explain in this section how we executed our runs, how this helped to surface some of our issues, and overall guidance for your runs.

As we strategize, we started with bare minimum Application infrastructure, we wanted to explode the application on minimum infrastructure with the maximum load it could sustain. The application was running on a single Pod and an M10 MongoDB atlas on Azure.

We are sharing our setup configuration, resource utilization before running the load and when the application was under stress to keep it consistent and help your understanding of the overall picture.

RUN 1:

Resources and Configurations used:

We didn’t configure the Autoscaling for AKS deliberately so we could check how the application can fare with nimble configuration and this could also act as a benchmark for the next executions. This could also potentially surface any application performance issues (if any)

We took a snapshot of how much is the resource utilization before we start running the load.

The pod is not serving any traffic and the current load is at 3% with no major memory consumption.

The above screenshot is from the Mongo Atlas monitor, and the mongo DB CPU utilization is around 1% only. Now we’ll start running the load.

Running the load with 1 thread, for 5 minutes on a single pod gives the following results.

Mongo DB CPU shoots up to 10% and gradually decreases. No major impact at the database end, let's now peek into the AKS metrics.

Average CPU consumption is less than 50% but the 95th percentile was above 90%.

JMeter also provides a ton of information which we’ll analyze below:

The above table is taken from the JMeter output which is now also available inside the Azure pipeline using the Azure pipeline task.

At this point you get how your service is performing within a given load; if the response time is greater than your expectations, you already know there are opportunities to optimize the code. You can run the code profiler and that should provide you the sufficient details on what improvements can be done. The total samples, transactions/seconds, and failure rate are other metrics that can help to figure out other potential issues.

Just the first execution can provide you with a lot of opportunities and once you have surfaced/fixed them, we are ready to move to the next step which is to increase some more load.

RUN 2:

Resources and Configurations used:

For this run, we vertical scaled the pod so the application will have more resources at its disposal.

We kept the same load time with another thread introduced, to check if the overall performance is improved or a better throughput and if are we getting any failures from the application.

The application pod was in better shape than the prior execution relatively. The throughput we got from this setup was better than the initial run due to the vertical scaling of the pod. For our use case due to internal reasons, we didn’t scale up further but to determine the maximum size of the pod for production usage you could scale up further till you get the corresponding increase in throughput.

This is a crucial step that should uncover issues related to CPU throttling, memory footprint, multi-threading, etc. Adding unused resources to the pods won’t do justice, so this step will also help to determine a decent maximum configuration to run your application.

As the threads are increased too, monitoring DB at this step is also important. We started seeing an increase in the overall metrics of the database, but everything was in range. Time to put some serious load.

RUN3:

We increased the threads, also added another Pod into the mix so we can start putting a load on the database as well. This will help us to uncover some database-related issues e.g. changes in Index, memory utilization, Disk IO to name a few.

Resources and Configurations used:

we increased the load to run for 1 hour and the number of threads was increased, to be added/terminated at 5 minutes intervals.

AKS pods while under the load:

Database CPU performance:

Even though the database CPU was in bounds and had no overall spikes but looking at the other metrics excited us a lot. Suddenly we were seeing a lot of DB connections on the atlas Dashboard and overall response time increased by leaps and bounds. DB write queries write average time was relatively same, but the DB reads average time increased. This led to performance optimizations on our application DB layer as well as a lot of improvements on the DB indexes. With the given load, we surfaced the database-related issue but in your case, you may have to increase the load before you may face any such issue.

Surfacing the database related optimization early in the cycle has the importance of different magnitude, In our case, we could solve the problem just by adding & modifying existing indexes but imagine if we had to change the table/document structure, the impact it would have on the project could be huge. Shifting it to left and doing the rapid iterative performance check and optimizations helps to modify the system with less cost.

DB metrics will tell you a lot about the performance, while under load you can also run the DB profiler for more query level metrics.

Now with some issues ironed out, time to put the system under more constant pressure.

RUN 4:

With the previous runs, we can determine the maximum pod configurations to run the application. Time to set up the AKS HPA (Horizontal pod scaling) so the application can scale automatically with the load and will provide a more realistic outcome.

Resources and Configurations used:

Before starting the load, the AKS cluster has one ingress pod and 2 pods related to our service.

The database before running the load was in limited use as well.

We started running the load for 6 hours and we kept the threads the same as before. As soon we started adding the load, the HPA kicked in and we had all the 4 pods catering to the incoming traffic.

With all the optimizations in place, the pods & database performed relatively well.

We were happy with the outcome, though this step helps to discover many pitfalls. As the system is in constant load, your load infrastructure might itself run into issues due to memory, etc. Keep a watch on it. Running the load for hours might surface issues on your application infrastructure level too. Variable load helps to keep it more realistic, look for database IO, timeout issues, API overall performance, fatal errors, abrupt pod terminations, throughput, etc.

Jmeter Dashboard output provides many more tables and charts which helps to discover relevant information related to the runs. In our blog, we only shared the application performance index but we used other tables and charts from the Jmeter Dashboard output. We highly recommend using other relevant charts as you deem fit.

SUMMARY:

Rapid performance optimization is an important step that is often overalled. Many times, we get into performance testing and optimization only when we release our code to a higher environment e.g. staging or pre-production. This often leads to varying costs of change and delays to production causing business impact. Shifting to the left quickly provides timely feedback which reduces big code changes later in the development cycle and helps us in proper sizing of application running environments.

--

--