Just a minor upgrade

Last week we had a strong reminder of basic software facts. First, performance tests are important. Then, testing rollback mechanisms is important as well. And finally, dependencies are as hard to reason about in the Java ecosystem as in any other one (like say, Javascript).

The tale of a failure

We were in the process of upgrading our Java applications from Spring Boot 2.0.6 to Spring Boot 2.1.2. It was supposed to be of limited risk, given that it was a move between two “minor” versions, and we managed to have it working pretty quickly. There were some impacts, but those were documented.

We performed various functional and technical tests and everything looked fine. Only, our automated performance tests had had issues for some time and, why, we didn’t run at least some of them manually to validate that area. Confident as we were, we finally deployed our upgraded apps to production and… within 30 minutes our platform went down :-(

We observed that a high number of sockets were left in a CLOSE_WAIT state by the Web server, and that the server couldn’t take new requests anymore. (Performance tests are important, have I said that?). But we failed to precisely identify the actual culprit at first (some signs were pointing to our metrics system), and it took us some time before taking the obvious action: perform a rollback.

Once we decided for it, it was only to discover that the rollback wasn’t possible since the previous version of some of our apps weren’t available anymore in our deployment system. (Testing rollbacks is important…)
After having finally rebuilt the missing apps and rolled back, came the time of investigation. That was a great time to see that we’re not all used to diagnose such kind of problems, and therefore to share and improve in that area. (While discussing about it with a friend of mine this week-end, I could measure how much I had to learn. Thanks for your input Alnour ;-)

Spring Boot 2.1 with Jedis is (seriously) broken

It turned out that although Spring Boot 2 replaced Jedis with Lettuce in its Redis-related code, it still supports using Jedis, and therefore still fixes a version for Jedis in its BOM. It also happens that our feature flipping system uses Jedis to cache its configuration. Spring Boot 2.1.2 upgraded Jedis from version 2.9.0 to version 2.9.1, which version leaks connections (see the Github issue).

We can only assume that the Spring Boot team can’t run performance tests for every supported configuration as well, especially now that they’re pushing Lettuce rather than Jedis.
If you happen to use Spring Boot 2.1.2+ with Jedis (and also Spring Boot 1.5.x if I’m not mistaken), we can only advise you to force Jedis’ version to 2.10.2.

No need to say that we’ll remember those lessons.

Meanwhile, if you think you’re the kind of person that can help us improve on those subjects, don’t hesitate to contact us, we’re hiring!