Spring Service: Improving Processing Time Could Harm Service Scalability

Eliran David
Oct 24, 2019 · 7 min read

Improving processing time by running tasks in parallel became very easy in the latest versions of Java and Spring. Must we understand the threads’ usage behind the syntactic sugar? Read on to find out.

Let’s Start With the Code

Below is a simple controller of service (I will name it users service) that gets the input userId and returns the user object: id, name, and phoneNumber:

[The full project can be found here. The service is a Spring Boot application with embedded Tomcat. Having this kind of service running can be done in less than 5 minutes.]

As you can see, to get the user name and phone number, I need to use a different service (called the metadata service), which is a common scenario. Response time of GET /metadata/names/{userId} and GET /metadata/phoneNumbers/{userId} is around 300 milliseconds each and that's given. Our focus is the response time (and improvement) of GET users/{userId}. Names and phone numbers are randomly generated.

Running the GET request http://localhost:8888/users/2505 via Postman, I get a response in 661 milliseconds. This makes sense because we know getting the user name takes around 300 ms and getting the user phone number also takes around 300 ms, and we do it sequentially:

We can see all parts are executed by a Tomcat worker thread.

Improve Processing Time

Both requests, getting the name and phoneNumber, require only the userid as the input, so none of each depends on the other. Why not run both in parallel and see if it improves response time?

There are a few ways to do that. CompletableFuture is very tempting because of its simplicity. It was introduced in Java 8 as a concurrency API improvement. CompletableFuture.supplyAsync together with lambda expressions looks almost like magic (see new code below). The changes are minor, only lines 4-8:

Now, running the GET request http://localhost:8888/users_parallel/2505 via Postman, I get a response in 347 milliseconds. This is an improvement of almost 50 percent in response time. Voilà. Should we open the champagne? Well, wait a minute...

We need to test the performance of our service when servicing more than one client in parallel, which is more similar to production. This is usually what we aim for — more requests mean more activity. I will start with a basic test of 20 requests per second. You can assume that the metadata service is not a bottleneck.

Remember that our focus is on APIs /users and /users_parallel.

For simulating client requests, I am using Gatling, an open-source framework written in Scala. Execution definition in Gatling is called “simulation”. Writing simulation is pretty easy. See here for my “20 requests per second during 60 seconds” simulation. Below are the results. The orange line is the number of active users/requests, which abusively decreases after 60 seconds.

The graph describes the response time over time. We definitely can see the response time increases over time. The average is 27289 ms and the 95-percentile response time is 51370 ms. By monitoring the service, with VisualVM, I didn’t see issues related to CPU or memory. For comparison, let’s run the same test (20 requests per second) against the sequential implementation /users. These are the results:

Put back the champagne to the cellar. We see that before our “improvement,” the 95-percentile was dramatically lower — 709 ms. That actually makes sense when comparing to the processing time we observed at the beginning — each call to the metadata service takes around 300 ms. So what went wrong?

Identify Bottlenecks

Let’s go back to the parallel execution and try to understand the bottleneck. Theses are the execution logs when running one request:

We can see that calling the metadata service is not done by the Tomcat worker threads (named nio-X-exec-Y) but by the ForkJoinPool.commonPool() threads, which is the default thread pool of CompletableFuture.

When it comes to threads, I usually chase bottlenecks by logging some kind of request-id during the different threads' processing parts (start/end of each thread). So I will add it to the logs and run again (for simplicity, in the existing logs, I will concatenate UUID to the user id). Running the simulation again and filtering the logs by just one of the request ids (2505_e6a1f189-ee48-4742-8b2d-f506dcbeb599), this is what I see:

Although the Tomcat thread started executing the request on 21:11:24, ForkJoinPool thread calls the metadata service only at 21:11:39, meaning we waited 15 seconds to have a thread available to call the metadata. It seems like the bottleneck is ForkJoinPool.commonPool(). Our Tomcat threads (by default, there are 200) are blocked by ForkJoinPool.commonPool(), which only have 8 threads (as the number of my processors). From the ForkJoinPool documentation:

“For applications that require separate or custom pools, a ForkJoinPool may be constructed with a given target parallelism level; by default, equal to the number of available processors."

Also, monitoring the threads using VisualVM, I always see 7 “Running” worker threads of that ForkJoinPool (I have noticed one is missing — didn't figure out why yet — feel free to comment):

The Tomcat threads are usually in the “Park state. This state is triggered when calling LockSupport.park, which is part of the implementation of CompletableFuture.get. In our case, we use it when a Tomcat thread waits for a ForkJoinPoolthread to complete (see name.get() andphoneNumber.get() in getUsersParallel). If most of the time the 200 Tomcat threads are blocked by those 7 threads, we actually can set the number of Tomcat threads to 8 and get the same results. Adding the line below to application.properties, we get:

Running the same simulation against /users_parallel, I get a slightly better result comparing to the preview parallel simulation. The average response time of 26524 ms and 95-percentile of 49066 ms. This is probably due to less context switch.

So, we actually get better results when using fewer threads.

Optional Solution

We know that each Tomcat thread (there are 200) needs to have 2 threads available (because we want to get the name and phone number in parallel) to not be blocked. So, having a thread pool of 400 threads for CompletableFuture should release the bottleneck (see full code here) :

Running the simulation again, I get an average response time of 339ms and 95-percentile of 345ms, which is the best compared to all other options we saw:

Conclusions

We must understand the threads’ usage of the magics supplied by java.util.concurrent and Spring. What we do next actually depends on our service usage and our resources. If the /users service is a background task, having the sequential approach is enough. In case the /users is a user-facing application, then the response time is very important, and therefore, we must take the parallel approach and manage our own thread pool and pick the number of the threads depending on a few parameters.

The optional solution suggested is to be examined carfeully because we actually added resources, and we should be able to verify if it matches our machines in production. Memory should not be an issue (usually when it comes to adding 400 threads), but you should verify that the CPU is not too busy with switching threads.

However, I will recommend changing the number of threads while testing performance on the machine used to find the best number. More logical cores and faster (instructions per second) of CPU allows more threads. Having more threads than cores is relevant for service like /users that mostly do I/O tasks, but in case of CPU intensive tasks (like transcoding), we should stick to the number of cores.

@elirandav

Photo by Harley-Davidson on Unsplash

Nielsen-Tel-Aviv-tech-blog

A publication by the Nielsen Tel Aviv Engineering team…

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store