<?xml version="1.0" encoding="UTF-8"?><rss xmlns:dc="http://purl.org/dc/elements/1.1/" xmlns:content="http://purl.org/rss/1.0/modules/content/" xmlns:atom="http://www.w3.org/2005/Atom" version="2.0" xmlns:cc="http://cyber.law.harvard.edu/rss/creativeCommonsRssModule.html">
    <channel>
        <title><![CDATA[Stories by Sakshem on Medium]]></title>
        <description><![CDATA[Stories by Sakshem on Medium]]></description>
        <link>https://medium.com/@jain.sakshem?source=rss-7801ad69c5ac------2</link>
        <image>
            <url>https://cdn-images-1.medium.com/fit/c/150/150/1*IRB_p3vPPxCVC0Wp6k3n-Q.jpeg</url>
            <title>Stories by Sakshem on Medium</title>
            <link>https://medium.com/@jain.sakshem?source=rss-7801ad69c5ac------2</link>
        </image>
        <generator>Medium</generator>
        <lastBuildDate>Thu, 28 May 2026 00:57:59 GMT</lastBuildDate>
        <atom:link href="https://medium.com/@jain.sakshem/feed" rel="self" type="application/rss+xml"/>
        <webMaster><![CDATA[yourfriends@medium.com]]></webMaster>
        <atom:link href="http://medium.superfeedr.com" rel="hub"/>
        <item>
            <title><![CDATA[The Hidden Cost of Virtual Threads: When Your Performance Gains Break Your Downstream Services]]></title>
            <link>https://medium.com/@jain.sakshem/the-hidden-cost-of-virtual-threads-when-your-performance-gains-break-your-downstream-services-81608e36cf15?source=rss-7801ad69c5ac------2</link>
            <guid isPermaLink="false">https://medium.com/p/81608e36cf15</guid>
            <category><![CDATA[microservices]]></category>
            <category><![CDATA[java]]></category>
            <category><![CDATA[spring-boot]]></category>
            <category><![CDATA[virtual-threads]]></category>
            <category><![CDATA[java21]]></category>
            <dc:creator><![CDATA[Sakshem]]></dc:creator>
            <pubDate>Wed, 24 Dec 2025 07:02:25 GMT</pubDate>
            <atom:updated>2025-12-24T07:02:25.261Z</atom:updated>
            <content:encoded><![CDATA[<p>Java 21 brought us virtual threads, and they delivered exactly what they promised. Our services could suddenly handle thousands of concurrent requests with minimal memory overhead. The incoming request queue that used to pile up during peak hours? Practically empty. Response times? Consistently low. It felt like a free lunch.</p><p>Until it was not.</p><h3>What Are Virtual Threads Anyway?</h3><p>Traditional Java threads (platform threads) are mapped 1:1 with operating system threads. Creating thousands of them is expensive. Each one consumes around 1MB of stack memory, and context switching between them is costly. This is why servlet containers like Tomcat default to a thread pool of around 200 threads.</p><p>Virtual threads flip this model. They are managed by the JVM, not the OS. They are incredibly lightweight (a few KB each), and the JVM can create millions of them. When a virtual thread blocks on I/O, the JVM parks it and reuses the underlying carrier thread (which runs on actual CPU cores) for other work.</p><p>In Spring Boot 3.2+, enabling virtual threads is embarrassingly simple:</p><pre>spring:<br>  threads:<br>    virtual:<br>      enabled: true</pre><p>That single line transforms your blocking servlet container into something that behaves almost like a reactive framework while keeping your familiar imperative code.</p><h3>The Success That Became a Problem</h3><p>After upgrading our entry point service to Java 21 with virtual threads, everything looked great. During peak load, our observability metrics showed the request queue size at zero. Every incoming request was immediately assigned to a virtual thread and processed. No more bottleneck at the thread pool level.</p><p>The peak request queue size dropped from <strong>40–80</strong> requests down to <strong>zero</strong> consistently. We could handle thousands of concurrent requests without breaking a sweat.</p><p>Then users started seeing “Something went wrong” errors. Our gateway was tripping circuit breakers. Something was very wrong.</p><h3>The Investigation</h3><p>Digging into the logs revealed a troubling pattern. The service under investigation was making HTTP calls to a downstream service for data processing. The response times for these calls were catastrophic:</p><p>Downstream API response times:</p><ul><li>Before the fix: Some requests taking 60–80 seconds to complete during peak traffic</li><li>After the fix: ~2ms average response time</li><li>Total calls in 12 hours: 2.94 million requests</li><li>Request rate: 68–106 requests per second sustained</li></ul><p>What made this particularly interesting is that the downstream service does not perform any blocking I/O operations. It is pure CPU work: data transformation, computation, and processing. Yet it was taking over a minute to respond.</p><p>The service under investigation was configured with these HTTP client settings:</p><pre>webclient.connection.timeout=20000<br>webclient.read.timeout=30000<br>webclient.maxConnection=200<br>webclient.pendingAcquireTimeout=30000<br>webclient.pendingAcquireMaxCount=500</pre><p>Many requests were failing with ReadTimeoutException after waiting 30 seconds. The downstream service, still running on traditional platform threads, simply could not keep up.</p><p>Here is what was happening:</p><figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*5BI2BEVO-w3fVRptZFiuGw.png" /></figure><p>The current service could now accept 3000+ concurrent requests because virtual threads are cheap. But two bottlenecks emerged:</p><ol><li>HTTP Connection Pool: Only 200 outbound connections available. Requests compete for these connections, with many waiting up to 30 seconds (<strong>pendingAcquireTimeout</strong>).</li><li>Downstream Thread Pool: The downstream service running on platform threads could only actively process around 200 requests simultaneously. Everything else queued up in the servlet container’s accept queue.</li></ol><p>The combination meant requests would wait for a connection, then wait again for the downstream service to have an available thread, resulting in 60–80 second response times.</p><h3>The Surprising Part: No Blocking I/O Required</h3><p>The downstream service was doing pure computational work. No database calls, no external API calls, just CPU-bound operations. Yet it still suffered from thread starvation.</p><p>This reveals an important insight: virtual threads help most with I/O-bound workloads where threads spend time waiting. But when your downstream service is CPU-bound or simply cannot process requests fast enough, virtual threads in the current service just amplify the problem by removing the natural rate limiting that platform thread pools provided.</p><h3>The Solution</h3><p>The fix is straightforward once you understand the problem: upgrade your entire call chain.</p><p>We enabled virtual threads on the downstream service:</p><pre>spring:<br>  application:<br>    name: processing-service<br>  threads:<br>    virtual:<br>      enabled: true<br><br>server:<br>  tomcat:<br>    threads:<br>      max: 200<br>    max-connections: 10000<br>    accept-count: 1000<br>    connection-timeout: 20000</pre><p>The critical changes here:</p><ul><li>max-connections: 10000 allows the server to accept many more concurrent connections</li><li>accept-count: 1000 sets the queue size for connections waiting to be accepted</li><li>Virtual threads enabled means the threads.max: 200 limit no longer applies. Each accepted connection gets a lightweight virtual thread immediately, and the JVM manages execution across available CPU cores.</li></ul><p>After the upgrade, the results were dramatic:</p><p><strong>After both services on virtual threads:</strong></p><ul><li>Average response time:<strong> 2.035ms</strong></li><li>p99 response time: <strong>~5.3ms</strong></li><li>Peak request queue size: <strong>0 requests </strong>(both services)</li><li>Total requests handled in 12 hours: <strong>2.94 million</strong> without breaking a sweat</li><li>Sustained rate: <strong>68–106 requests</strong> per second with room to spare</li></ul><p>The downstream service went from timing out after 60–80 seconds to responding in 2 milliseconds. That is a 30,000x improvement.</p><h3>How to Detect This Before Production Breaks</h3><p>If you have proper observability, look for these metrics:</p><ol><li>Request Queue Size: Track how many requests are waiting for a thread. With virtual threads, this should be near zero. If your downstream service shows high queue sizes while your current service shows zero, you have an imbalance.</li><li>Response Time Distribution: Watch for bimodal distributions where some requests are fast but others timeout. This indicates intermittent resource exhaustion.</li><li>HTTP Client Pending Acquires: Monitor how many requests are waiting for an HTTP connection from the pool. This can reveal connection pool exhaustion even before downstream services show problems.</li></ol><p>These metrics were invaluable. We could see the request queue at zero in the service under investigation but did not have visibility into the downstream service initially. Once we added the same observability, the problem became obvious.</p><h3>Bonus: Async Logging Makes a Real Difference</h3><p>While investigating performance, we noticed another bottleneck. Synchronous logging. Every log statement blocks the thread until the log is written. With thousands of concurrent virtual threads all trying to log, this becomes a serialization point.</p><p>The fix is to use asynchronous logging with LMAX Disruptor, a high performance inter-thread messaging library:</p><p>LMAX Disruptor uses a ring buffer and lock-free algorithms to achieve extremely high throughput. Log messages are written to the buffer immediately (non-blocking), and a background thread handles the actual I/O. In our testing, this contributed to keeping response times consistently low during high load scenarios.</p><h3>Key Takeaways</h3><ol><li>Virtual threads work exactly as advertised. They remove thread limitations at your service boundary. That is both the benefit and the danger.</li><li>Even CPU-bound downstream services suffer. You do not need blocking I/O for virtual threads to create problems downstream. Any service that cannot process requests as fast as they arrive will buckle under the load.</li><li>Your service is only as fast as its slowest dependency. If you upgrade one service to handle massive concurrency, ensure the entire call chain can keep up.</li><li>Monitor the right metrics. Request queue sizes and response time distributions tell you where the bottleneck actually is. Our production dashboard shows peak queue size consistently at zero once both services had virtual threads.</li><li>Roll out strategically. Start with services that have fewer downstream dependencies, then work your way up the call chain. Or go top-down and upgrade all dependencies first.</li><li>Do not forget logging. With high concurrency, synchronous logging becomes a surprising bottleneck. Async logging with LMAX Disruptor is a worthwhile optimization.</li></ol><p>Virtual threads are a powerful addition to Java. Our production metrics prove they work: zero request queues, sustained high throughput, 2.94 million requests in 12 hours with 2ms response times. Just remember that with great concurrency comes great responsibility for your downstream services.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=81608e36cf15" width="1" height="1" alt="">]]></content:encoded>
        </item>
        <item>
            <title><![CDATA[Building a Reusable WebClient in Spring Boot: The Smarter Way to Handle HTTP Calls]]></title>
            <link>https://medium.com/@jain.sakshem/building-a-reusable-webclient-in-spring-boot-the-smarter-way-to-handle-http-calls-da92450f8691?source=rss-7801ad69c5ac------2</link>
            <guid isPermaLink="false">https://medium.com/p/da92450f8691</guid>
            <category><![CDATA[connection-pooling]]></category>
            <category><![CDATA[spring-webclient]]></category>
            <category><![CDATA[reusable-component]]></category>
            <category><![CDATA[spring-boot]]></category>
            <dc:creator><![CDATA[Sakshem]]></dc:creator>
            <pubDate>Sat, 20 Dec 2025 14:59:36 GMT</pubDate>
            <atom:updated>2025-12-20T14:59:36.277Z</atom:updated>
            <content:encoded><![CDATA[<figure><img alt="" src="https://cdn-images-1.medium.com/max/1024/1*cFIdttSup87Fdmq0p5ECGw.png" /></figure><h3>Why WebClient?</h3><p>If you’re building microservices with Spring Boot, you’ll eventually need to call other services. WebClient is Spring’s modern, non-blocking HTTP client that replaced the older RestTemplate. It plays nicely with reactive programming and handles concurrent requests efficiently without blocking threads. Simply put, it’s built for the way we write applications today.</p><h3>The Problem with the Traditional Approach</h3><p>When developers first start using WebClient, they typically create separate client classes for each downstream service they need to call. Your codebase ends up looking something like this:</p><pre>@Component<br>public class UserServiceClient {<br>    private final WebClient webClient;<br>    <br>    public UserServiceClient() {<br>        this.webClient = WebClient.builder()<br>            .baseUrl(&quot;http://user-service&quot;)<br>            .build();<br>    }<br>    <br>    public User getUser(String id) {<br>        return webClient.get().uri(&quot;/users/&quot; + id).retrieve().bodyToMono(User.class).block();<br>    }<br>}</pre><pre>@Component  <br>public class OrderServiceClient {<br>    private final WebClient webClient;<br>    <br>    public OrderServiceClient() {<br>        this.webClient = WebClient.builder()<br>            .baseUrl(&quot;http://order-service&quot;)<br>            .build();<br>    }<br>    <br>    public Order getOrder(String id) {<br>        return webClient.get().uri(&quot;/orders/&quot; + id).retrieve().bodyToMono(Order.class).block();<br>    }<br>}</pre><p>See the pattern? Every client creates its own WebClient instance with its own configuration. Now imagine you have ten downstream services. That’s ten places where you’ve duplicated timeout settings, SSL configuration, logging filters, and connection pool settings.</p><p>When Spring releases a breaking change or you need to add request logging across all clients, you’re updating ten files. This is a maintenance nightmare waiting to happen.</p><h3>A Unified Approach</h3><p>The solution is straightforward: create one WebClient bean with all your configurations, and one common client class that handles the actual HTTP operations.</p><p>First, you configure WebClient once:</p><pre>@Configuration<br>public class WebClientConfig {<br>    <br>    @Bean<br>    public ConnectionProvider connectionProvider() {<br>        return ConnectionProvider.builder(&quot;http-pool&quot;)<br>                .maxConnections(100)<br>                .pendingAcquireMaxCount(120)<br>                .pendingAcquireTimeout(Duration.ofMillis(16000))<br>                .maxIdleTime(Duration.ofMillis(150000))<br>                .evictInBackground(Duration.ofMillis(30000))<br>                .build();<br>    }<br>    <br>    @Bean<br>    public WebClient webClient(WebClient.Builder builder, <br>          ConnectionProvider connectionProvider) {<br><br>        HttpClient httpClient = HttpClient.create(connectionProvider)<br>                .responseTimeout(Duration.ofMillis(15000))<br>                .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, 5000);<br><br>        return builder<br>                .clientConnector(new ReactorClientHttpConnector(httpClient))<br>                .filter(logRequest())<br>                .filter(logResponse())<br>                .build();<br>    }<br>}</pre><p>Then, one common client handles all HTTP operations:</p><pre>@Component<br>public class CommonWebClient {<br>    <br>    private final WebClient webClient;<br><br>    public CommonWebClient(WebClient webClient) {<br>        this.webClient = webClient;<br>    }<br><br>    public &lt;T&gt; ResponseEntity&lt;T&gt; postRequest(String baseUrl, String endPoint, <br>            HttpHeaders headers, Object requestPayload, Class&lt;T&gt; responseType) {<br>        T response = webClient.post()<br>                .uri(baseUrl.concat(endPoint))<br>                .headers(h -&gt; h.addAll(headers))<br>                .bodyValue(requestPayload)<br>                .retrieve()<br>                .bodyToMono(responseType)<br>                .block();<br>        return ResponseEntity.ok(response);<br>    }<br><br>    public &lt;T&gt; ResponseEntity&lt;T&gt; getRequest(String baseUrl, String endPoint, <br>            HttpHeaders headers, Class&lt;T&gt; responseType) {<br>        T response = webClient.get()<br>                .uri(baseUrl.concat(endPoint))<br>                .headers(h -&gt; h.addAll(headers))<br>                .retrieve()<br>                .bodyToMono(responseType)<br>                .block();<br>        return ResponseEntity.ok(response);<br>    }<br><br>    public &lt;T&gt; Mono&lt;T&gt; postRequestMono(String baseUrl, String endPoint, <br>          HttpHeaders headers, Object requestPayload, Class&lt;T&gt; responseType) {<br>        return webClient.post()<br>                .uri(baseUrl.concat(endPoint))<br>                .headers(h -&gt; h.addAll(headers))<br>                .bodyValue(requestPayload)<br>                .retrieve()<br>                .bodyToMono(responseType);<br>    }<br>}</pre><p>Now any service can simply inject CommonWebClient and make HTTP calls without worrying about configuration details.</p><h3>What Happens Behind the Scenes</h3><h4>Connection Pooling</h4><p>This is where the real magic happens. When you create separate WebClient instances without shared configuration, each one typically maintains its own connection pool. That means if you have ten clients calling ten services, you could end up with ten separate pools, each opening and managing their own TCP connections.</p><p>With the unified approach using a shared ConnectionProvider, all your HTTP calls draw from the same pool. The connection provider manages:</p><ul><li>maxConnections: The maximum number of connections the pool can hold</li><li>pendingAcquireMaxCount: How many requests can wait in line for a connection</li><li>maxIdleTime: How long unused connections stay alive before being closed</li><li>evictInBackground: Periodic cleanup of stale connections</li></ul><p>Under the hood, Reactor Netty (which powers WebClient) uses an event loop model. Instead of one thread per connection, a small number of threads handle many connections through non-blocking I/O. When a connection sits idle, it returns to the pool for reuse rather than being closed and reopened.</p><h4>Request Flow</h4><p>When you call postRequest():</p><ol><li>WebClient asks the connection provider for an available connection</li><li>If one exists in the pool, it’s reused immediately</li><li>If not, and the pool isn’t full, a new connection is created</li><li>If the pool is full, the request waits (up to <strong>pendingAcquireTimeout</strong>)</li><li>After the response is received, the connection returns to the pool</li></ol><p>This reuse eliminates the overhead of TCP handshakes and SSL negotiations for every request.</p><h3>Benefits of This Approach</h3><p><strong>Single source of truth</strong>: Timeouts, SSL settings, logging, and error handling live in one place.</p><p><strong>Efficient resource usage</strong>: Shared connection pool means fewer open connections and better memory utilization.</p><p><strong>Easier debugging</strong>: Centralized logging filters capture all outgoing requests and incoming responses.</p><p><strong>Simpler upgrades</strong>: Library updates or configuration changes happen once.</p><p><strong>Consistent behavior</strong>: Every HTTP call follows the same patterns for error handling and logging.</p><h3>The Tradeoffs</h3><p>Nothing comes free. Here are things to think about:</p><p>One configuration for all services: If service A needs a 30-second timeout but service B needs 5 seconds, you need to handle this. You can pass custom timeout values per request, or create multiple WebClient beans for different timeout profiles.</p><p>Shared pool exhaustion: If one slow downstream service holds connections, it affects all other calls. Monitor your pool metrics and set sensible limits.</p><p>Blocking calls in reactive context: The example uses .block() which waits for the response. This is fine in traditional servlet applications but problematic in fully reactive stacks. Consider returning Mono&lt;T&gt; directly when working with WebFlux.</p><h3>Mitigating the Drawbacks</h3><p>You can still use this approach while handling edge cases:</p><ul><li>Create a second WebClient bean with different timeouts for specific high-latency services</li><li>Add per-request timeout overrides using .timeout() on the Mono</li><li>Expose both blocking and non-blocking methods (the example includes postRequestMono for reactive use)</li><li>Use circuit breakers (like Resilience4j) to prevent one failing service from exhausting your pool</li></ul><h3>Things to Keep in Mind</h3><p>When implementing this pattern:</p><ol><li><strong>Set realistic pool sizes</strong>: Too small and requests queue up. Too large and you waste resources. Start with defaults and tune based on actual load.</li><li><strong>Configure idle timeout wisely</strong>: Keep it shorter than any proxy or load balancer timeout in your network path. Connections closed by intermediaries cause unexpected failures.</li><li><strong>Add proper error handling</strong>: The common client should handle WebClientResponseException and other failures gracefully.</li><li><strong>Include correlation IDs</strong>: Pass trace IDs through context for distributed tracing. This helps debug issues across service boundaries.</li><li><strong>Monitor your pool</strong>: Reactor Netty exposes metrics. Watch for pending acquisitions and pool exhaustion.</li></ol><p>The unified WebClient approach isn’t revolutionary, but it’s a pattern that pays dividends as your microservices architecture grows. You write less code, maintain less configuration, and spend more time on actual features.</p><img src="https://medium.com/_/stat?event=post.clientViewed&referrerSource=full_rss&postId=da92450f8691" width="1" height="1" alt="">]]></content:encoded>
        </item>
    </channel>
</rss>