Java Virtual Threads — pitfalls to look out for!

Early research into Java Virtual Threads unearths some immediate and concerning but wholly fixable problems

12 min readNov 16, 2023

Java virtual threads are a great addition to Java and well executed as released in Java 21. Their simple thread-per-task paradigm is a breath of fresh air over the complexities of reactive frameworks, async-await, etc. However nothing in an ecosystem as large as Java’s is going to work perfectly, in all cases, on it’s first release.

Quick aside, this is not an attempt explain Java virtual threads. Please see https://blog.rockthejvm.com/ultimate-guide-to-java-virtual-threads for an excellent introduction.

TL;DR

Although virtual threads represent a valuable enhancement to Java, it’s crucial to approach their implementation with caution and subject them to rigorous testing at production scale. Many libraries, including Java’s own, do not yet perfectly support virtual threads, leading to challenges related to scalability, performance, resource management, and potential deadlocks.

Quick background

A few weeks ago I was happily playing with Java’s new virtual threads by testing pools of platform threads vs. virtual threads on a workload consisting of reading 100,000 or so image files. The fun began when I moved these image files to Google Cloud Storage and started accessing them using Google’s Cloud Storage Java client library. This produced a cascade of problems which I’ll take literary license to describe as if I found and attacked them in a coherent order, the reality was much messier.

The test application enumerates files from some source (local disk, Google Cloud Storage, etc.) then queues those files to be read using tasks provided to an ExecutorService created with either;

Executors.newFixedThreadPool() for pooled platform threads or
Executors.newVirtualThreadPerTaskExecutor() for virtual threads

The gory details aren’t critical to understanding the issues encountered so I’ll spare you those. Ultimately every file is getting read either from a platform thread or from a virtual thread using the trivial code below.

try (ReadableByteChannel rbc = input.open()) {
  var buf = ByteBuffer.allocate(1024*10);
  while (rbc.read(buf) > 0) {
    buf.rewind();
  }
}

Problem #1 — Where are all my threads?

TL;DR

Any libraries used by virtual threads might use synchronized methods or synchronized blocks around long IO operations. If so, this will pin the virtual threads to their carrier threads during the IO operations thereby limiting any performance advantage. You might be able to work around this but ultimately synchronized methods and blocks should be replaced with ReentrantLock. This has already been done in the Java libraries but there are still thousands of third party libraries which may have this problem.

Full Story

One of the metrics I was tracking was how many tasks were actually running. That is, I was counting (with an AtomicLong called runningCount) entries and exits from the tasks’ run method. When executing against local storage I got the expected result. For platform thread pools the runningCount would go to the thread pool size (1, 16, 64, or whatever) and stay there until the end. For virtual threads the runningCount would go into the 1000s as they were attached to real carrier threads then detached when blocked by the local disk read operations.

This all changed when I ran the virtual thread case against Google Cloud Storage. In that case the runningCount went to 16 and stayed there even though 1000s of virtual threads were being created and were ready to run. Sixteen (16) was a very suspicious number because it equals the core count on my desktop and is hence the default number of platform carrier threads allocated for virtual threads. This immediately implied to me that the virtual threads were either being pinned to their carrier threads or not being released during the HTTP IO operation of reading from Goggle Cloud Storage, but why?

The answer was almost right in front of my face but I didn’t see it immediately. Instead I replaced the code that called the Google’s Java client library to get the ReadableByteChannel with code that used Java’s HttpClient directly. So this …

public ReadableByteChannel open() {
  return storage.reader(blobInfo.getBlobId());
}

… became this …

public ReadableByteChannel open() {
  HttpRequest request = HttpRequest.newBuilder(new URI(blobInfo.getSelfLink() + "?alt=media")).GET().header("Authorization", "Bearer " + TOKEN).build();
  HttpResponse<byte[]> response = httpClient.send(request, HttpResponse.BodyHandlers.ofByteArray());
  return Channels.newChannel(new ByteArrayInputStream(response.body()));
}

… obviously leaving out error and exception handling for clarity. This worked! Like the local storage case runningCount when into the 1000s as expected. There were other problems (see problems 2, 3, & 4), but one thing at a time.

So I then knew that it wasn’t my code or some intrinsic problem with virtual threads and HTTP, rather something inside Google’s Java client library was either pinning the virtual thread to the carrier thread or however they were doing HTTP wasn’t blocking and releasing the virtual thread. As Murphy would have it I choose the wrong one to pursue and reimplemented the open method (above) using the same older HTTP classes (HttpsURLConnection, etc.) that Google’s library did. Again, runningCount when into the 1000s indicating the virtual threads were being correctly released from their carrier threads on HTTP IO operations.

That left only the possibility of pinning, which only occurs when…

The thread executes inside a synchronized method or block
The thread calls a native method or a foreign function

Problem found

I then got ready for a long debugging session looking for some hidden synchronized deep in Google’s library. I stepped into the library’s top level read method and literally facepalmed myself. Sitting there all the time was a synchronized directly on Goggle’s read method.

This prevented the virtual threads from being released from their platform carrier threads during the long read operations leaving all other virtual threads waiting, thereby removing any performance advantage gained by using virtual threads in the first place.

This is in no way a “bug” in Google’s library. The library is very reasonably preventing two threads concurrently calling the read method on a given BaseStroageReadChannel object. This is good, defensive coding, however with the advent of virtual threads this formerly somewhat innocuous synchronized keyword becomes a significant impediment to realizing this new technology’s performance gains.

A solution

I had already worked around this issue by using Java’s HttpClient to read each Blob directly. That happened to be relatively easy in this case but what about all the other libraries out these one might want to use with virtual threads but are using synchronized around IO operations? For that matter what, if anything, should I be doing or saying to Google about this issue?

Oracle’s documentation has this to say on the matter…

Pinning does not make an application incorrect, but it might hinder its scalability. Try avoiding frequent and long-lived pinning by revising synchronized blocks or methods that run frequently and guarding potentially long I/O operations with java.util.concurrent.locks.ReentrantLock.

So the “best” solution to this problem would be to replace the synchronized keyword with code using a ReentrantLock at several locations in Google’s code so this…

@Override
public final synchronized int read(ByteBuffer dst) throws IOException {
  EXISTING CODE
}

becomes this…

private final Lock readLock = new ReentrantLock();
@Override
public final int read(ByteBuffer dst) throws IOException {
  readLock.lock();
  try {
    EXISTING CODE
  }
  finally {
    readLock.unlock();
  }
}

I cloned Google’s library from GitHub, made these changes locally, and the problem was fixed. I’ll submit a PR for this work.

But why?

You might ask, why would virtual threads work with a ReentrantLock but not the synchronized keyword since they do the same thing? It’s a great question and one I don’t have a perfect answer for except to say that the synchronized keyword is implemented as part of the JVM itself whereas ReentrantLock is part of the java.util.concurrent library. My guess is that the synchronized keyword is so deeply entwined with how the language is executed it makes removing the pinning behavior with virtual threads difficult. The JEP does say this however…

In a future release we may be able to remove the first limitation above, namely pinning inside synchronized.

Problem #2 — Too much of a good thing

TL;DR

Java’s HttpClient library in its default HTTP/2 mode has scalability limitations when making multiple concurrent requests to the same HTTP/2-enabled scheme:host:port. This will limit the utility of virtual threads in certain cases.

Full story

While working around Problem #1 using Java’s HttpClient I immediately ran into another problem. Once I switched to HttpClient, my virtual threads were free of the synchronized problem with Google’s library and quickly ran into the 1000s. I immediately started to receive 1000s of IO exceptions like this…

java.io.IOException: too many concurrent streams

After some Googling it became clear that HttpClient was using HTTP/2 by default. HTTP/2 uses a single TCP/IP connection to stream multiple HTTP requests in parallel up to some limit specified by the server, Google’s cloud storage endpoints in this case. Using nghttp I decoded the HTTP/2 interaction with said endpoint revealing the following.

[  0.068] send SETTINGS frame <length=12, flags=0x00, stream_id=0>
(niv=2)
[SETTINGS_MAX_CONCURRENT_STREAMS(0x03):100]
[SETTINGS_INITIAL_WINDOW_SIZE(0x04):65535]

So Google only allows 100 concurrent streams for each HTTP/2 connection. The obvious fix to this would be to establish multiple HTTP/2 connections each allowing 100 concurrent streams. I naively assumed that each HttpClient instance would generate it’s own TCP/IP connection to Google so I switched from using a single static HttpClient to a small pool of them. This changed nothing. Multiple HttpClient instances appear to reuse the same connection for a given scheme:host:port and will never establish a second, third, etc. connection. This means I was completely limited to 100 concurrent requests despite the fact I had 1000s of virtual threads available to make requests.

IMO this is a bug in Java’s HttpClient class. It should either…

Automatically start a new TCP/IP connection once the server’s specified number concurrent streams is reached, OR
Use a new TCP/IP connection for each HttpClient instance allowing the developer to avoid or react to too many concurrent streams exceptions.

A workaround?

There doesn’t seem to be a workaround for this using Java’s HttpClient and HTTP/2. Some third party Java HTTP/2 clients might allow multiple connections but I didn’t look into it. What I did do was switch to HTTP/1.1 which I’ll cover next.

Problem #3 — Asynchronous assumptions

TL;RD

Java’s HttpClient library creates a large number of addition, costly, and useless platform threads when used with virtual threads. Luckily the workaround is easy.

Full story

At this point I’m working around both Problem #1 and Problem #2, before I solved Problem #1 correctly. Switching Java’s HttpClient to use HTTP/1.1 instead of HTTP/2 was as simple of changing this…

private static final HttpClient httpClient = HttpClient.newBuilder()
.build();

to this…

private static final HttpClient httpClient = HttpClient.newBuilder()
.version(HttpClient.Version.HTTP_1_1)
.build();

This immediately solved the too many concurrent streams issue in Problem #2 and I could then easily make 1000s of concurrent requests to Google storage up to whatever TCP/IP connection limit my machine, Java, Google, etc. would allow. I was finally able to saturate my internet IO bandwidth, woo-hoo!

My joy was quickly diminished when, during some routine debugging, I happened to look at the running threads in my debug console and saw this…

I played with the number of concurrent virtual threads and also with platform threads pools and quickly found that Java’s HttpClient creates one additional platform thread for every concurrent HTTP request coming from a unique thread be it platform or virtual. That means the 1000s platform threads are being created! What’s going on?

This behavior make perfect sense if we consider that HttpClient was written to enable asynchronous HTTP through its sendAsync method which returns a CompletableFuture AND it was written way before virtual threads. In that world it made sense to allocate a whole additional platform thread (HttpClient-1-Worker-* above) to execute the HTTP request while the calling thread (always a platform thread) does other stuff. The worker count would never get too crazy because the number of platform threads one could throw at a problem was in the 100s at most, not in the 1000s or 10000s as is now the case with virtual threads.

In today’s world, when using virtual threads we simply don’t need those additional, costly, slow to create platform HTTP worker threads. The virtual threads can use the HttpClient class’s simpler, synchronous send method, relying on virtual threads to provide high concurrency rather than having to code it using complex constructs like CompletableFuture. That is, in fact, the whole point of virtual threads, to make the code simple and straightforward while still being able to fully utilize the available IO bandwidth.

Workaround

Luckily there’s a simple workaround. The HttpClient.Builder interface allows one to provide the ExecutorService used to create these platform HTTP worker threads. We can force HttpClient to create only a single additional thread by doing this…

private static final ThreadFactory httpThreadFactory = 
  new ThreadFactoryBuilder()
  .setNameFormat("my-single-http-thread-%d")
  .build();

private static final ExecutorService httpExecutor = 
  Executors.newFixedThreadPool(1, httpThreadFactory);

private static final HttpClient httpClient = HttpClient.newBuilder()
  .version(HttpClient.Version.HTTP_1_1)
  .executor(httpExecutor)
  .build();

Having only a single thread saves a ton of startup time and memory but doesn’t change the execution path or performance at all since (both by my testing and by my reading of the Java code) the send method executes in the calling thread as one might expect.

Is this a bug?

Yes, I’d say this is an unintended consequence of virtual threads that no one caught. After some more research I’ll report it as such.

Problem #4 — Deep deadlock

TL;DR

Don’t assume every Java library has been fully tested with virtual threads in all cases. There will be bugs!

Full story

As I researched problem #1, my original workaround was…

public ReadableByteChannel open() {
  HttpRequest request = HttpRequest.newBuilder(new URI(blobInfo.getSelfLink() + "?alt=media")).GET().header("Authorization", "Bearer " + TOKEN).build();
  HttpResponse<byte[]> response = httpClient.send(request, HttpResponse.BodyHandlers.ofByteArray());
  return Channels.newChannel(new ByteArrayInputStream(response.body()));
}

This code reads the entire file into memory then creates a new Channel out of it. Unless one knowns the files are all small this might be an inefficient way to do things. So I tried another variation just for fun. The code above became this…

public ReadableByteChannel open() {
  HttpRequest request = HttpRequest.newBuilder(new URI(blobInfo.getSelfLink() + "?alt=media")).GET().header("Authorization", "Bearer " + TOKEN).build();
  HttpResponse<InputStream> response = httpClient.send(request, HttpResponse.BodyHandlers.ofInputStream());
  result = Channels.newChannel(response.body());
}

This new code (error and exception handling again omitted) provides an InputStream (specifically an HttpResponseInputStream) to underly the ReadableByteChannel thereby avoiding the memory cost of reading the whole file and delaying the consumption of the HTTP response until the ReadableByteChannel is actually read.

I was very surprised when this produced a deadlock condition! After successfully processing some “random” number of files my application would lock up as all the virtual threads and their platform carrier threads sat in wait states, while other platform threads still ran just fine. Unfortunately diagnosing such deadlocks can be problematic but I gave it a go along several axes.

First, I tried my debugger and profiler tools but neither of them were virtual thread-aware enough to provide the information I might have needed to noodle the problem directly.

Second, I switched to a large (256) pool of platform threads instead of virtual threads. Not surprisingly this worked fine as such a deadlock with platform threads would have certainly been found long ago.

Third, I inspected the Java library code that handles implementing ReadableByteChannel on top of an InputStream.

@Override
public int read(ByteBuffer dst) throws IOException {
    if (!isOpen()) {
        throw new ClosedChannelException();
    }
    if (dst.isReadOnly()) {
        throw new IllegalArgumentException();
    }

    int len = dst.remaining();
    int totalRead = 0;
    int bytesRead = 0;
    synchronized (readLock) {
        while (totalRead < len) {
            int bytesToRead = Math.min((len - totalRead),
                                        TRANSFER_SIZE);
            if (buf.length < bytesToRead)
                buf = new byte[bytesToRead];
            if ((totalRead > 0) && !(in.available() > 0))
                break; // block at most once
            try {
                begin();
                bytesRead = in.read(buf, 0, bytesToRead);
            } finally {
                end(bytesRead > 0);
            }
            if (bytesRead < 0)
                break;
            else
                totalRead += bytesRead;
            dst.put(buf, 0, bytesRead);
        }
        if ((bytesRead < 0) && (totalRead == 0))
            return -1;

        return totalRead;
    }
}

Not very surprisingly there was a synchronized block right in the read method. As with Problem #1 this will pin the virtual threads to their carrier threads limiting scalability but it shouldn’t by itself create the deadlock I was seeing. Luckily the class that does this, ReadableByteChannelImpl, is pretty small so I made my own copy and again replaced synchronized with a ReentrantLock. This resolved the issue.

But what caused the deadlock? you might ask. The answer is, I don’t know, but it’s very likely that all the HTTP reader mechanics under HttpResponseInputStream have their own locking behavior that is causing the deadlock when the virtual threads are pinned. If you feel that’s a “weak” answer you’re correct and before I formally report this to the Java community I’ll try to narrow the problem down and write straightforward code that reproduces it.

Conclusion

The takeaway here is that virtual threads, while an excellent addition to Java, must be approached with caution and tested vigorously at production scale. Many libraries, even Java’s own, are not quite up to the task of fully supporting virtual threads without issues.

Java Virtual Threads — pitfalls to look out for!

Early research into Java Virtual Threads unearths some immediate and concerning but wholly fixable problems

TL;DR

Quick background

Problem #1 — Where are all my threads?

TL;DR

Full Story

Problem found

A solution

But why?

Problem #2 — Too much of a good thing

TL;DR

Full story

A workaround?

Problem #3 — Asynchronous assumptions

TL;RD

Full story

Workaround

Is this a bug?

Problem #4 — Deep deadlock

TL;DR

Full story

Conclusion

Written by Phil Boutros