Systems Resiliency: Simple Tomcat Backpressure
When designing a resilient system, you’ll be looking to incorporate mechanisms designed to gracefully handle and recover from failures. You may already be familiar with the bulkhead and circuit breaker patterns offered by Hystrix that help protect your service against misbehaving dependencies — but how do you prevent your service from taking on more work than it can handle from its clients?
You may have heard the term backpressure in relation to a car’s exhaust. Wikipedia states:
Backpressure refers to pressure opposed to the desired flow of gases in confined places such as a pipe.
In the context of systems resiliency, backpressure is conceptual and about opposing normal system flow in response to workload demands that cannot be met. Simple backpressure mechanisms can be implemented at various levels in your stack, including within load balancing reverse proxies, service containers and as an integral part of your service itself. There are also more complex and elaborate ways of implementing backpressure — for example as part of a service mesh and involving signals and logic — but we won’t cover this here.
For this post, we’ll be looking at a way of applying simple backpressure to stop Tomcat-based services using HTTP APR/NIO/NIO2 connectors becoming overloaded and potentially destabilising part, or all, of the system.
Let’s take a look at a simplified scenario involving a web server and an appliance providing layer 7 load balancing for two Tomcat-based services that use HTTP APR connectors. In this scenario, connection limits and timeouts within our stack’s components have either been set to extremely large values, or not at all.
Let’s introduce a bad bot that’s attacking our system by subjecting a service A endpoint to sustained high request load. The load is far greater than service A’s cluster is capable of handling and manifests in resource saturation, slowdown and queuing by the service. Queuing then starts on the upstream load balancer and web server; our legitimate clients attempting to hit an endpoint on service A are now hanging, caught somewhere in a deep queue. Even though the attack is on service A, clients can barely reach any service B endpoints as there’s a continually-increasing deep queue, dominated by service A endpoint requests. This stack has allowed a seemingly isolated problem to propagate upstream, the effects of which eventually cause site-wide denial of service. Once bad bot ceases its attack (voluntarily or forcibly), the MTTR is high because the queue needs to be worked through, exacerbated by the slowdown caused by service A’s excessive workload. Overall, there is nothing graceful about this situation!
Let’s make one change and introduce backpressure on service A’s Tomcat containers. Applying the same scenario as above, service A will now actively reject requests beyond the request concurrency limit that the service owner has determined as the maximum before meltdown. By virtue of this, pending requests will not propagate upstream and our clients are not left hanging awaiting responses from service A’s endpoints. Service B is largely unaffected since there’s no deep queuing upstream and our legitimate clients can still potentially use at least part of the site. In contrast to the scenario above, post-bot attack MTTR is low because there’s no queue to be worked through and service A has been operating without the effects of excessive load. This relatively simple change has resulted in a much more graceful response to, and recovery from, this particular scenario.
The bot scenario is just one example of when backpressure is desirable. Systems will regularly be subjected to conditions that cause excessive workload demands — transient traffic spikes, deployments, service degradation, service dependency issues and so forth — and ensuring backpressure enables your stack to shed excessive load, stop failure from cascading and improve MTTR.
The Tomcat HTTP connector (a component that connects your service(s) to the outside world via HTTP) comes in several different flavours: BIO, APR, NIO and NIO2. Selection is dependent on Tomcat version and whether APR native libraries are present:
- <8: BIO default, overridden by APR if native libraries are present.
- 8: NIO default, overridden by APR if native libraries are present.
- 8.5: NIO default, overridden by APR if native libraries are present. BIO removed.
The BIO connector uses a one thread per connection model — a Tomcat worker thread will be tied up for the duration of the client connection. The others have more of a one thread per request model — worker threads will only be tied up for the duration of a request rather than the lifetime of the connection¹. Why is this relevant here? BIO by default will not allow more connections than worker threads (maxThreads — 200 by default), so Tomcat will not accept any additional connections (and their requests) once all worker threads are in use². NIO and APR by default allow 10,000 and 8,192 concurrent connections respectively (maxConnections), and once all worker threads are in use, Tomcat will continue to accept additional connections to these limits² and queue them indefinitely until worker threads become available to process their requests.
¹ If clients use HTTP pipelining, connections have affinity to a specific Tomcat worker thread for as long as that connection has more than one in-flight request and for as long as the connection is established.
² There is also the acceptCount attribute (default 100), which controls how many additional connections the operating system will accept and is in addition to maxConnections.
So, how can you use the APR and NIO connectors and ensure your service provides backpressure when it’s under stress? You could limit set maxConnections the same as maxThreads, but that could artificially limit service capacity if you expect Tomcat to maintain and process requests from many connections. A more request-centric way is to use a StandardThreadExecutor for your connector in place of the standard internal executor, which gives finer grained control over several aspects, including queuing behaviour. Tomcat allows executors to be shared across multiple connectors, if desired. In server.xml, executors are configured as follows:
Here, we allow a maximum of 200 concurrent requests and a maximum queue of 25 pending connections within Tomcat (it might be tempting to set maxQueueSize to an extremely low value, but you should consider leaving some headroom to deal with legitimate request spikes). Additional connections beyond the 25 pending are simply closed immediately, with the following log message:
WARNING: Socket processing request was rejected for: <socket handle>
java.util.concurrent.RejectedExecutionException: Work queue full.
To help illustrate behaviour of the above configuration, let’s take an example of a service being subjected to a constant 300 requests per second, with the service response time of exactly 1 second and no client pipelining. At any given second, Tomcat is processing 200 requests, queuing 25 connections for processing and providing backpressure by actively closing 75 surplus connection requests.
The value of maxThreads is of course dependent on context, however it should be set before the point of request concurrency that causes unacceptable service degradation.
Backpressure is an important mechanism for safeguarding your system — it helps to:
- Prevent services from collapsing under extreme load.
- Minimise cascading upstream effects.
- Spring your system back into shape quickly following problems.
Degradation of service whilst preventing catastrophic failure is surely preferable to allowing the effects of unchecked workloads — this blog post has hopefully gone some way in demonstrating the possible effects and given you practical advice on enabling simple backpressure within Tomcat.