Chaos Engineering: Chaos Testing Your HTTP Micro-Services
Failing To Succeed And Succeeding At Failing
TLDR: Your microservices are vulnerable to unexpected failure, if services they depend on fail in some way (and you don’t handle it). Fault test your HTTP microservices using a “Chaos Proxy”.
Here’s one I made earlier:
ClusterF** Chaos Proxy is an unreliable HTTP proxy you can rely on; a lightweight tool designed for chaos testing of…
Chaos Engineering — What Is It?
Chaos Engineering is a great idea — build an automated solution/tool to randomly attempt to break a system in some way; ultimately to learn how the system behaves in such situations. Then you can use your newfound knowledge to find ways to make the system more fault tolerant during these failure conditions in the future.
What Is A Chaos Proxy?
A Chaos Proxy is a service that your microservices can connect to.
It routes traffic to real destination microservices and returns responses back to the microservices through the proxy — but does so in a very unreliable way.
Through the proxy, requests are randomly delayed and/or randomly fail in unexpected ways — all for the sole purpose of helping you understand how the microservice responds to these various failure conditions.
Why Would Anyone Want An Unreliable HTTP Proxy?
Everything fails eventually. Everything.
Accept it and embrace failure. Design for failure. Succeed at failing.
Microservices often communicate with other services via REST and HTTP. How do your microservices cope, when the services they depend on inevitably fail in some unpredictable way?
Your microservices are vulnerable to unexpected failure, if services they depend on fail (and you haven’t accounted for the failure or defined how your service should behave).
Why Is This Useful?
Recently I was investigating a JDBC connection leak in a microservice.
With modern frameworks abstracting away JDBC operations, connection leaks shouldn’t really happen these days, but alas there was a connection leak.
I wanted to assess how resilient the microservice (A) was to failures and delays in another microservice (B) that it depended upon.
I needed a way to simulate periodic failures and delays in microservice ‘B’ while I performed requests and automated regression tests locally against microservice ‘A’.
I could access microservice ‘B’ on a remote environment but because of various constraints, I couldn’t run ‘B’ up locally to try to modify it to emit failures.
I couldn’t really find something existing that was lightweight, reasonably easy to set up, and that accomplished what I hoped to accomplish.
After some fiddling around, the first iteration of ClusterFk Chaos Proxy was born!
Thanks to ClusterFk Chaos Proxy, I was able to identify that with sufficiently delayed responses from microservice ‘B’, the JDBC connections in microservice ‘A’ would stack up and stick around for as long as the HTTP request was active — even if the JDBC transaction had actually long since committed.
With the cause known, this opened up a range of possible solutions for the issue (and an easy way to test their effectiveness through the chaos proxy), e.g:
- Implement controlled timeout on request from ‘A’ to ‘B’.
- Implement timeout of JDBC connections and return to the connection pool.
- Make elements of processing asynchronous so the request thread exits quicker.
ClusterFk Chaos Proxy
The premise is simple:
- Configure your locally running service-under-test to point to the Chaos Proxy and configure the Chaos Proxy to point to your real running dependent-destination-service.
- Switch on ClusterFk Chaos Proxy and configure a “chaos strategy”.
- Use your microservice (fire requests at it).
- Watch the world burn (through monitoring logs or through application behaviour).
- Optional — Learn from the chaos and implement changes to improve the resilience of your microservice.
At the time of first putting the Chaos Proxy together, I wasn’t really aware of the concept of a Chaos Proxy, but I decided to finish the first iteration off.
ClusterFk Chaos Proxy is on DockerHub. To install simply:
docker pull andymacdonald/clusterf-chaos-proxy
JAVA_OPTS: "-Dchaos.strategy=RANDOM_HAVOC -Ddestination.hostProtocolAndPort=http://10.0.0.231:8098"
Configure a chaos strategy as per the project’s README.md:
NO_CHAOS - Request is simply passed throughDELAY_RESPONSE - Requests are delayed but successful (configurable delay)INTERNAL_SERVER_ERROR - Requests return with 500 INTERNAL SERVER ERRORBAD_REQUEST - Requests return with 400 BAD REQUESTRANDOM_HAVOC - Requests generally succeed, but randomly fail with random HTTP status codes and random delays
Once the application is up, you can point the microservice(s) you want to test at your ClusterFk Chaos Proxy instances (instead of the real destination services). Then just fire up the microservice and start testing and using it.
Depending on the strategy you’ve picked, the proxy will effect the strategy against the requests you send to it.
Probably the most useful strategies are
DELAY_RESPONSE — but you still might find the others useful.
More features will be added in the future with more configurable options!
I’d appreciate if you’d give some feedback on the project and if you find it useful.
Thanks for reading! 😃
Hopefully, you’ve enjoyed this article and the introduction to the concept of a Chaos Proxy.
Although I’ve used my own personal project here, the concept is incredibly simple to implement. Feel free to take my project and fork it or just make your own implementation!