Virtual thread: performance gain for microservices
The most awaited project of JDK is finally released as a preview feature with JDK-19. For detail about the preview feature browse JEPS425, it’s a fine detailed documentation, strongly recommend going through the official JEPs.
The blog content is organized as:
- Why virtual threads are special.
- Integration with the spring-boot embedded tomcat-based application.
- Performance gains in microservices.
- Debugging with virtual threads.
- Possible pitfall of virtual thread
Why virtual threads are special.
- Little’s law: Concurrency for an application is equal to the throughput multiplied by the average response time. Till now, concurrency was constant, take the example of tomcat based application where the worker thread is equal to 200. Although this is a tunable parameter but limited by OS memory and CPU, so the throughput is limited by the number of worker threads in the webserver. But with the virtual thread, which is a lightweight thread then the concurrency can scale to a very large number.
- No longer 1:1 mapping with kernel threads: With early JDK, when most computers had single core only, JVM had green threads which were many to one relationship between JVM threads to a kernel thread. But eventually multi-threaded outperformed the green threads. Till now JVM has a one-to-one mapping between kernel threads and JVM threads, and JVM threads are a wrapper over functionalities over kernel thread so gain around a few functionalities like execution thread stacks which we easily get in thread dump. With virtual thread, JDK again going with M:N relationship, but here M can be very large in numbers. So, basically, virtual threads are user-spaced threads that get scheduled over a fork-join carrier thread pool. And this carrier fork-join pool is default equal to the number of processes.
- Network blocking IO no longer blocking on virtual thread: Java has modified the java.io.socket in such a way that if blocking IO is executed over virtual thread, it's no longer blocking. And with this modification, all blocking calls whether it’s aerospike, MySQL, httpclient automatically convert to non-blocking as under the hood all use java.io.Socket only.
- For non-blocking application development, it’s a major gain, as till now we have only the reactive paradigm as an option for non-blocking application. The general challenge around the reactive paradigm is to get reactive equivalent libraries for all remote calls (databases, cache, HTTP etc), a lot of learning curve for these paradigms,s and lastly the debugging is tremendously hard.
Integration with the spring-boot embedded tomcat-based application.
To try out in microservices with the Jeps425 preview feature, the question was do we need to wait for embedded tomcat to adopt it?
The answer is no, spring boot application provides a way to pass a custom thread pool as a worker pool. We need the below change to test out springboot-tomcat-based application with virtual threads:
public class VirtualThreadApplication {
public static void main(String[] args) {
SpringApplication.run(VirtualThreadApplication.class, args);
}
@Bean(TaskExecutionAutoConfiguration.APPLICATION_TASK_EXECUTOR_BEAN_NAME)
public AsyncTaskExecutor asyncTaskExecutor() {
return new TaskExecutorAdapter(Executors.newVirtualThreadPerTaskExecutor());
}
@Bean
public TomcatProtocolHandlerCustomizer<?> protocolHandlerVirtualThreadExecutorCustomizer() {
return protocolHandler -> {
protocolHandler.setExecutor(Executors.newVirtualThreadPerTaskExecutor());
};
}
}
Performance gains in microservices.
We have tested three scenarios for performance testing.
NOTE: All observation in this blog is performed with the jdk19 virtual thread preview feature.
- API exposed with constant one-second thread sleep time.
@GetMapping
@RequestMapping(value = "/slowAPI" , method = RequestMethod.GET)
public ResponseEntity<String> slowResponseTIme(@RequestParam("timeslowness") int slowness) throws InterruptedException, URISyntaxException {
Thread.currentThread().sleep(slowness);
return ResponseEntity.ok("success");
}
With this performance test, it is clear that via traditional threadpool, we can’t get more than 200 tps as after it the request queue up, and post 200 concurrency response time increase w.r.t increase in concurrency. But with virtual thread pool as tomcat worker thread, the tps increase linearly w.rt to concurrency.
2. API: Integrating with slow third-party API.
@GetMapping
public ResponseEntity<String> test() throws InterruptedException, URISyntaxException {
HttpHeaders headers = new HttpHeaders();
headers.set("content-type","application/json");
HttpEntity requestEntity = new HttpEntity<>(null, headers);
URI uri = new URI("http://localhost:9098/sample/slowAPI?timeslowness=1000");
ResponseEntity<String> response = restTemplate.exchange(uri, HttpMethod.GET,requestEntity,String.class);
return ResponseEntity.ok("success");
}
For this use-case, we want to test the Socket behavior, does it block the thread on blocking remote call. With a traditional thread pool, the tps throttled at 200 TPS as the default worker pool size with tomcat is 200. And post 200 concurrencies, the request starts to pile up. But with a Virtual thread pool as tomcat worker thread, we were able to achieve almost 2000 tps with 2000 concurrency. This is a major gain. Although we can gain similar throughput in the traditional way too but for which we need to increase the worker thread and post a particular point we can’t increase it as the traditional thread is resource intensive and we will run out of memory.
3. API exposed over JPA with select sleep(1) query. (Sleep at DB level.)
@GetMapping
@RequestMapping(value = "/slowDBcall" , method = RequestMethod.GET)
public ResponseEntity<String> dbCall() throws InterruptedException, URISyntaxException {
userRepository.executeSleep();
//Internal code of userRepository
// {
// @Query(value = "select sleep(1)", nativeQuery = true)
// public int executeSleep();
// }
return ResponseEntity.ok("success");
}
This particular use case is interesting and the observation is just contradictory to the expectation. The virtual thread pool as a worker pool throttled at 16. As there were 16th platform threads and the tps throttled at that number clearly stated that the blocking of carrier thread created this throttling. And via the thread dump validated the same observation.
Debugging in virtual thread based application:
Virtual thread is the most awaited feature as it provides reactive paradigm seamlessly with traditional programming favor, no another paradigm learning curve, just use virtual thread and the code automatically react like reactive.
Along with this, debugging reactive code is a lot cumbersome. But with virtual thread, what exactly is happening on a request can still be got via thread-dump. Although as the virtual thread can grow to a very large number previous thread dump utils don’t give virtual thread stack. For this you need to use the new jcmd command which writes the thread stacks in a file.
jcmd PID Thread.dump_to_file -format=json /path/to/the/file
Possible Pitfall of virtual threads:
- Blocking calls: Unknowingly if any remote call or any piece of code which can create a blocking call at the carrier thread level will bring the throughput to number of carrier thread * response time (little’s law). Similar behavior we saw in the third use-cases of performance testing.
- Default bulkhead is lost: We can think previous 200 worker thread as a safeguard for the flood of requests from the client application. It was acting as a bulkhead that the calling service can’t call more than 200 concurrencies from a single service instance. But with virtual thread, this safety measure is lost, which is positive and negative. Now application needs to either build a bulkhead on the service level or application must need to build a rate limit so it could not be flooded with a very high number of concurrency.
Conclusion:
Virtual threads are a very handy way of reactive programming. And as the hype, it can solve reactive programming in a very unique way with minimal code changes applications. But still, whether it is able to handle all network socket call as non-blocking under the hood is the question. It’s still in preview feature, so hoping that it handles all network blocking call.