Performance Tips for Container Applications
By Felipe Lino, Senior Software Engineer — .NET
At FARFETCH, we live in a very nice and pleasant world of microservices, running everything on containers, using our Docker images. But working with containers has its pitfalls.
In this article, I’m going to cover a little bit about the Infrastructure & Environment: What to look for? How to look for it? And finally, talk about optimization and tradeoffs.
Questions about your environment & infrastructure
Your application, even though it is a container, will run in an environment. To make the correct decisions about optimization, you need to ask some deeper questions about its environment.
Here are some common “lies” that we tell about containerized applications:
- They are isolated.
- One container won’t affect the other.
- You don’t need to care about where the container will run.
- You can reproduce the same behaviour locally.
These assumptions seem true if you download the production image and locally execute a docker run command to run experiments.
But let’s take a closer look at these assumptions with a few good questions.
Will your container share resources with other containers?
Well, the same machine could run multiple containers of the same application or containers of other applications.
So, if the answer is YES, you are going to have the following:
- Better usage of your physical resources (CPU, Memory, etc)
For example, an IO Bound application where the application won’t use CPU all the time. Even CPU bound applications eventually will be blocked waiting for IO operations.
- If the intention is auto-scaling for example, it makes sense to have more than one container on the same host
- Saves money, as a single host running multiple containers will be less costly than spreading them across multiple hosts.
- The physical resources available changes over time. This happens a lot in an elastic environment where resource sharing can flex between few and many containers.
- Containers impact each other because they are competing for system resources.
If the answer is NO, it will probably cost you more money. Because you may be wasting resources while your container is waiting for an IO operation, for example.
Are all the containers in the host your own?
If so, we can conclude the following:
- The team responsible for the applications controls any sharing of system resources.
- The development team needs to know about the system infrastructure.
- The development team may be able to tune applications and hardware to suit their objectives.
- Resources could be wasted. This may include:
- IO-bound applications where the CPU isn’t used the most of the time.
- Batch operations, where your application runs in a scheduled way and sits idle between batches.
- The infrastructure team controls resource sharing.
- The development team has fewer options to tune the application.
- More efficient control of resource management.
Up until this point, we’ve only thought about containers and “business” applications. But we have more questions.
Will hosts share containers with other applications?
A given host can run containers for other applications to cooperate with ours. Some examples include the following:
- HTTP Proxies: e.g., haproxy
- HTTP Server: such as Nginx or Apache
- Application to capture logs: e.g., filebeat or logstash
- An application to work as caching layer, such as Nginx
Any of these applications will affect ours too.
Will the environment enforce limits?
To establish a more even share of resources, environments are often constrained with limits. Below we will take a detailed look at configuring for environment limits by using an example.
Concerns regarding your container/application?
Until now, we’ve asked questions about your environment and infrastructure. Now let’s ask questions closer to your development team.
Will your application run inside a Virtual Machine/Server?
Currently, at FARFETCH, we have several languages and technology stacks available. For example, if you use Java, Scala, Groovy, or Kotlin, your application will be running on a JVM (Java Virtual Machine). If you develop a .Net application, you can run it on a .Net Kestrel Web Server.
You can see that we’re no longer talking about running one process inside the container. Instead, several processes, or at least several concurrent threads, will be competing for CPU resources at once.
Will your container run more than one application?
We’ve come to our last question about the container. It is now common practice to deliver an image with an NGINX Plus application (or similar) to be compatible with the designed environment. In this case, you could have combinations of several applications running in one container and several containers and applications running on the same machine. This sounds like a nightmare, but it is quite common.
What metrics should you look at?
Once you know more details about your production environment, we need to better understand the resource metrics to measure.
We should inspect the easy-access metrics associated with our aforementioned environment limits, i.e.:
- Memory usage
- CPU usage
- File Descriptors
- Network bandwidth
- Disk I/O operations
How to look up these metrics?
We first need to simulate the application with the same restrictions that we will have in the desired environment. Ideally, we could take them exactly from our “docker run” command.
The objective here is to look at how our application behaves with environment restrictions, take some measures and decide how we are going to optimize it.
The docker run Command
Let’s examine a simple docker run command:
The command above allows us to set some limits on hardware/infrastructure:
Illustrating with an example application
To better demonstrate these concepts, I will use an application as an example. It is a very simple one: a Java REST application with two endpoints:
- GET /status: do nothing, only returns HTTP 200
- POST /sort: Sort N lists of integer numbers internally, using async for each list and with a thread sleep for a random number of milliseconds.
- The “thread sleep” here emulates an external call
- The async approach is for running several concurrent threads
My approach is what I call “the minimum and maximum requirements” of my application. To do that, I will run the same scenarios twice: the first time without limits and a second round with some limits.
This scenario consists of two phases:
- Performance testing on the endpoint /status (do nothing)
- Performance testing on the endpoint /sort (does the real work)
The source code of the demo application is available here.
Here are the shell commands we will use in each scenario.
This runs the application on our local machine with or without limits (as mentioned above):
This checks the status of our container with information provided by the docker server.
Open File Descriptors
This checks how many file descriptors are opened by our container.
You can use any tool to run the performance test. Using the simple command-line Apache Benchmark tool, we can run a test that fits our purpose.
-c : Concurrent Users
-n : Number of iterations
-s : timeout (seconds)
Performance test on the endpoint /status
For this performance test, without limits, we run the command:
We can then count the number of open file descriptors during the test, also checking CPU, Memory, Network, and IO operations.
The results above show that when my application does nothing:
- It requires at least 287MB of RAM;
- It will use 0.05% of CPU; and
- It will open 40 file descriptors.
This information provides us with a hint about the minimum resource requirements just to run the application.
Performance test on the endpoint/sort
Next, we are going to run the performance test over the endpoint that actually does something.
One observation before we continue. The application has some configurations to limit the amount of threads that will be used:
- corePoolSize (default: 2);
- maxPoolSize (default: 2);
- queueCapacity (default: 3)
Why do we have this? There are two reasons. First, it is good to limit your application so that it does not crash in your environment. Second, it illustrates tuning on both application and environment.
Let’s run our performance test over the /sort endpoint:
We can see that with the default configuration for the thread pool, the application can’t handle too many requests.
The log shows:
The CPU and Memory used is a little bit more than before. The same amount of file descriptors were opened. The average response time is 93ms.
All of this tells me that the minimum resource requirements for my application are insufficient. We might start with the thread pool configuration.
Introducing resource limits and re-testing
To make our application handle the total requests smoothly, we need to adjust a few things.
So now we are going to run the same application applying resource limits (given by the environment for example) and customizing our JVM and thread pool.
- Max file descriptors open: 128
- Max memory RAM: 1000MB = 1GB
- CPU: 0.5
Customizing our JVM and application thread pool:
- Limiting our memory usage in our JVM: -Xms312m -Xmx750m
- Thread Pool:
- Core pool size: 100
- Max pool size: 120
- Queue capacity: 300
Our command to run the application is now:
Let’s run the same performance test over the /sort endpoint again:
Checking the results, the amount of memory didn’t increase. The quantity of file descriptors is the same. The CPU usage increases from 0.07% to 0.36% once the application is using more threads.
But notice that our average response time now is 192ms, higher that the previous one (93ms). This is due to the limit for our application to use half (0.5) instead of the full (1.0) CPU on the host machine.
I encourage you to re-run the same performance test, changing the docker run parameters and the application configuration to see how it affects its performance footprint.
To achieve our objective to meet the maximum operational requirements, we need to understand if the response time is acceptable or not. If this configuration does not satisfy our use cases, we will need to change the parameters to meet acceptable limits that comply with our SLA.
Tuning: Making Optimization Tradeoffs
All applications have a dream state of infinite memory to run within. Since this is not possible, you will need to balance how much your application requires to operate. For example, how many threads you can safely create without exceeding the memory limit? In our tests our lists were very small, but how would a much bigger data set affect your application? How much data will be stored in the cache?
You might easily notice the performance hit when we changed the — cpu from 1.0 to 0.5, so it is an important parameter. Is it better to increase the number of threads? Or should you increase the shared CPU value?
Some infrastructures give you the option to change it. But sometimes it’s a fixed value shared among all applications.
File Descriptors x Sockets
Although increasing the thread pool in my example application didn’t change the number of open file descriptors, it is an area worth checking.
Past experience taught me that increasing the number of sockets also increases the number of file descriptors. The number of sockets can increase due to opened HTTP or database connections, for example.
Most servers enforce a socket limit for applications at the OS level. Thus any infrastructure/environment sets a limit dividing the maximum number of sockets allowed by the OS among its various containers. It is not difficult to achieve this limit, and when you do your application will start to crash or misbehave.
How can you solve this? You need to balance the number of threads, the connections used by the application and the number of instances. If you have more instances spread over different hosts using fewer resources, this could be better than having fewer instances using a lot of threads, connections etc.
Number of containers x Threads (workers) inside the application
Very similar to the previous point, you can have a “greedy” container that uses a lot of threads. This will consume more memory, CPU, and open more connections (file descriptors). Alternatively, a “moderated” container will use fewer threads and consequently fewer resources, but to handle the load will need more instances of these “moderated” containers.
For example, imagine if your application faced a seasonal load and your infrastructure supported auto-scaling to horizontally scale it with demand. In this case, a moderated container could be enough, because you can easily increase the number of containers when needed. But if your load is relatively constant, you can opt to have fewer containers but more “greedy” ones.
Specific configurations for your Server/JVM
Another thing to consider are specific container configurations when they are running inside a JVM/Server.
For example, Undertow is a JVM that assumes default values based on the host machine. The configured thread limit is reflected in the value of a variable called server.undertow.io-threads.
By default, this value is equivalent to the number of CPUs. So imagine that you have a server with 64 cores to run multiple containers. Even though your docker run command says that you are going to use 0.5 CPU, your JVM will try to use all the 64 cores. You need to explicitly set this parameter on the startup of the JVM to limit the server.
The Kestrel Server has a similar configuration called KestrelServerOptions.ThreadCount. While just an example, you should check the specifics of your environment.
Most new versions of common servers and frameworks (e.g., Spring Boot Framework) are now changing their approach to consider limits given by a docker server. Or at least they give you the option to tell you if your application is running on a host or as a container.
While not discussed here, clouds and infrastructure can limit network bandwidth usage and IOPS (Input/Output Operations per Second, or typically disk operations). Other system dependencies such as API dependencies, databases, and file systems can also limit your application with throttling. Since all could impact your application performance, you may need to design for them.
Containerized applications are in vogue nowadays. We can see many advantages: they are portable, and they are highly tailored for Kubernetes and elastic environments. But once we deal with different environments it is important to notice that each one requires performance tuning. You will need to run performance tests to better understand your resource constraints and utilization.
In this article I introduced a few key concepts:
- Asking questions about your operating environment/infrastructure including CPU, memory, file descriptors, and network bandwidth.
- How you can take measures of your container by using Docker and linux commands, and how you can interpret the results.
- How you can configure your operational parameters through Docker options.
Running your application in a container won’t solve all your problems, but I hope these tips help you use containers more effectively.
Originally published at https://www.farfetchtechblog.com on March 12, 2021.