Minimize Impact in Kubernetes Using Argo Rollouts

Ariel Simhon
5 min readFeb 28, 2022

--

As a DevOps engineer at Intuit, I work with teams to adopt best practices for different deployment scenarios. In this post, we’ll go over pain points of the default deployment strategy in Kubernetes (K8S) and how to solve them.

Current deployment strategies

First, let’s understand what kind of deployment strategies and pain points we have in K8S, including their pros and cons for deploying your application out of the box. (This is basically how customers get new application versions.)

Pain points

It’s clear that a Rolling Update strategy allows you to avoid application downtime. However, there are other issues with this strategy that can impact your customers:

  • Default deployment strategy in K8S might prove too aggressive, as there is no ability to control traffic flow to the new version.
  • You can’t query external metrics to verify an update.
  • You can halt the progression, but you can’t automatically abort and roll back the update if, for example, there’s a 500 error for specific API calls during or after a release.
  • Manual action to start a rollback procedure is slower than automated procedure.

To address these pain points, Intuit open sourced a deployment controller called Argo Rollouts that codifies these industry-standard strategies into a custom resource called Rollout.

Progressive delivery pattern

Progressive delivery is the process of releasing updates to a product in a controlled and gradual manner. This reduces the risk of the release, typically coupling automation and metric analysis to drive the automated promotion or rollback of the update.

Using progressive delivery, application revisions are released gradually (i.e. 5% traffic -> Analysis -> Increase).

Below, you can see how progressive delivery works.

A high-level view of progressive delivery with canary traffic:

With progressive delivery, impact is minimal, as it only affects a subset of customers until 100% of traffic has been directed to the new version.

The solution

Argo Rollouts create new deployment patterns:

  • Blue/green deployment that allows users to reduce the amount of time multiple versions run simultaneously.
  • Canary deployment strategy, where you can release a new version to a small percentage of production traffic.
  • Progressive delivery (canary with analysis), a continuous delivery with fine-grained control.

With progressive delivery capability, an automatic rollback to the previous version occurs in case of low-level errors during a release (i.e., 5% of traffic). You can also run analysis after 100% traffic (i.e, 12 hours after a release).

The flexibility of Rollouts fits many different use cases of API services. The Argo Rollouts tool is flexible, which helps you customize canary progression or analysis to different API services. It also has several providers you can use for analysis.

I like to use Prometheus for most of the analysis to get an accurate result and for the ability to roll back fairly quickly to the previous version in case of a small error rate. It also allows you to expose business data to get a deeper analysis of the application (i.e., errors in credit card processing).

You should always catch errors during sanity or integration tests (or even during unit tests, depending on the scenario). Argo Rollouts simply adds the progressive delivery capability, with canary to control low-level traffic and analysis to analyze it alongside the deployment. This analysis is the last gate for application owners to roll back the version in case of any faults.

At Intuit we use Argo CD and integrate Argo Rollouts with it. That way, Argo CD understands the health of Argo Rollouts. These health checks determine whether the Argo Rollout objects are progressing, suspended, degraded or healthy.

It’s also possible to get information on strategy type, steps, weight current step, and more using the integrated dashboard, as shown below.

Does progressive delivery suit my service?

Customers can still have an impact in cases of errors, but progressive delivery helps to minimize that impact.

For low-traffic services, you may want to apply a different Rollout setting, as you can’t get a full picture of the new application version within a normal time window using progressive delivery. However, it might take longer to collect enough data to analyze low-traffic applications’ releases. Alternatively, low-traffic applications can get generated traffic on the canary version. Not all scenarios can be tested in production, but this is the best you can get for low-traffic services. Analysis can also run after 100% promotion (i.e., 4 hours of analysis after 100% version release).

For services with high traffic (24x7), the analysis window might be shorter since Prometheus provides enough data.

Migrating from a default deployment strategy to a canary strategy using Argo Rollouts is fairly simple with minimal changes in K8S manifest (Only different API version & kind type). The second part is setting up canary and analysis based on existing metrics.

Outcome as a service owner and customer-side impact

Addressing the pain points mentioned above:

  • Continuous delivery provides fine-grained control.
  • Argo Rollouts provides several ways to perform analysis to drive progressive delivery.
  • Automatically promoting/rolling back the update depends on pre-defined analysis templates.

As a customer of Argo Rollouts, I’ve found that this tool is extremely important for API services, both with high and low traffic.

I’ve already mentioned that bugs should be found earlier, before production deployment. But I’m realistic and glad that Argo created additional deployment strategies that allow me to safely monitor my applications during deployment, thereby reducing the risk of introducing a new software version in production. Rollouts leave me feeling more confident by providing me with an additional level of control based on real metrics.

Additional resources

To learn more about Argo Rollouts and progressive delivery, take a look at the links below:

--

--