Implementing a Canary Release in Kubernetes for Java Applications by Leveraging Spring Boot’s @ConditionalOnProperty Annotation

Before diving into the problem, let’s start with a basic introduction to some key technologies and techniques:

What is a canary release?

Canary release is a technique to reduce the risk of introducing a new software version in production by slowly rolling out the change to a small subset of users before rolling it out to the entire infrastructure and making it available to everybody.”

Source: https://martinfowler.com/bliki/CanaryRelease.html

What is Kubernetes?

Kubernetes (k8s) is an open-source system for automating deployment, scaling, and management of containerized applications.

What is Spring Boot?

Spring Boot makes it easy to create stand-alone, production-grade Spring-based Java applications that you can just run. It comes with some neat features that work well with Kubernetes, including:

  • An embedded Apache Tomcat server that allows us to package our application into a JAR, containerize it with Docker, and then run it with a simple java -jar app.jar command
  • Health check endpoints for Kubernetes to probe and route requests to healthy pods.

Introduction to the Problem:

The Castlight mobile app talks to a Spring Boot web service hosted in Kubernetes via a legacy API gateway written in Ruby called Gatekeeper. We wanted to migrate to a newer and better API gateway we built with Spring Boot and Spring Cloud Gateway that we called Edge. Edge uses a non-blocking event loop based implementation, which is ideal for an API Gateway. This approach scales much better than a blocking I/O model.

Our requirements:

  • No downtime
  • Gradually route requests via the new API gateway on a customer by customer basis, as Castlight’s business model is B2B2C
  • Backward compatibility with older versions of mobile clients
  • Scale application traffic independently for requests routed via Gatekeeper versus Edge
  • Forward different meta-data via HTTP headers to the downstream service when requests are routed through Gatekeeper versus Edge
  • Support running performance and load tests to understand the impact of introducing a new API gateway

Understanding Spring Boot’s @ConditionalOnProperty annotation:

Before we take a deep dive into the solution, here’s a quick introduction to Spring Boot’s @ConditionalOnProperty annotation:

@ConditionalOnProperty(value = "feature-a", havingValue = "true")

The above annotation checks if the Java application property “feature-a” has the value “true”. The application property could be supplied via spring-cloud-config or via JVM args as such: java -jar app.jar -Dfeature-a=true

The annotation can be placed on a Bean:

It can be placed on a Spring Configuration:

You can also choose between two different implementations of an interface:

This is a great way to manage features in Java applications. You can continuously integrate your code and enable or disable it appropriately for a canary release. There are more advanced conditional annotations in Spring but we will stick to what we needed.

Implementing Canary Release in Kubernetes:

Our key design decisions:

  • Have two Kubernetes deployments of the same application container image with the only difference being that Deployment A will have the run command java -jar -Dedge=false app.jar. Deployment B will have the run command java -jar -Dedge=true app.jar.
  • Have two Kubernetes Ingresses. Gatekeeper forwards requests to ingress https://service-a-gatekeeper.env.kube. Edge forwards requests to ingress https://service-a-edge.env.kube
  • When we use edge=true, a special set of Spring beans is enabled that makes our Java application aware of being behind Edge. This way we can process Edge-specific HTTP headers appropriately.
  • When we use edge=false, a special set of Spring beans is enabled that makes our Java application aware of being behind Gatekeeper. This way we can process Gatekeeper-specific HTTP headers appropriately.
  • One of the key requirements of canary release is to implement the routing layer. In our case, it is implemented in the mobile app. The routing layer talks to a different microservice (not shown in the picture) which tells it if the customer associated with the account has Edge enabled or not. If it is enabled, the layer will route all requests for that account via Edge. If not, it will route via Gatekeeper. This layer enables us to migrate API requests on a customer-by-customer basis.
  • Typically, routing layers are implemented at the API Gateway level or at the Kubernetes Ingress level but since we are replacing the gateway itself, we need to go one level above it and implement the routing in the mobile app.

Phasing the migration:

We took a phased approach for the migration to minimize risks and to scale gradually.

Phase 1: Both Gatekeeper and Edge exist. Older versions of mobile apps without the new routing logic are aware only of Gatekeeper. With newer mobile app releases, we can control whether requests are routed via Edge based on the customer with which a user is associated.

Phase 2: We waited several months and monitored to ensure almost all mobile users had updated their apps. During this time, we gradually migrated all of the requests across all customers to Edge.

Phase 3: Decommission Gatekeeper and the Deployment associated with it. We no longer want to support requests being routed via Gatekeeper. Mobile users must update their apps at this point.

Phase 4: Remove all the code associated with Gatekeeper in the backend. Remove all the routing logic in the front-end.

Summary

We safely and scalably migrated all requests to a new API gateway by deploying a single code base with new API gateway features enabled in one deployment and old API gateway features enabled in another deployment. We built a routing layer in our mobile app which talked to a microservice to determine which API gateway to use based on the customer.

The migration went smoothly and we didn’t experience any downtime. We were able to scale our Kubernetes pods independently as the application behind the new gateway had its own Kubernetes Service and pool of pods, and similarly for the application behind the old gateway. Performance and load testing was a breeze as we could programmatically route requests to either gateway and measure the results.

Thanks

It’s been a pleasure leading this project and I’d like to convey my special thanks to Sean Alexander, Illia Abdullaiev, and Sergiy Galitskiy for leading the front-end changes, Umamaheshwara Gupta Karnati and Faizan Hadfa for the brainstorming sessions, Sreenivasulu Sanduri and Kunal Jaura for helping and supporting us with the Edge API Gateway, John Smilanick for his knowledge and guidance on Kubernetes deployments, and Jim Griswold for reviewing our codebase for security vulnerabilities. And last but not least, our star QA engineer Andrei Stoma for his diligence in testing.

About Castlight

https://www.castlighthealth.com/

Hiring new talents to join this amazing team

--

--