EXPEDIA GROUP TECHNOLOGY — ENGINEERING

Continuous Deployment at Expedia Group

Fast, safe and repeatable deployments with an open-source solution

Luca Pelosi

Published in

Expedia Group Technology

6 min readNov 8, 2022

The Expedia Group™ (EG) technology stack is historically very large and varied as it includes many brands.

The 15 Expedia Group’s brands (2021) — Expedia Group™ brands

For some time, EG has been standardizing and automating cross-brand tools and processes in order to make the most of the peculiarities of each brand, increase efficiency and offer the best customer experience.

One challenge for a company with dozens of brands and different technology stacks is unifying everything under the same standard CI/CD platform that can adapt to a common cloud infrastructure.

In this article, we’ll cover the advantages of the EG approach and how we’ve implemented continuous deployment best practices thanks to the Spinnaker pipeline manager.

What is continuous deployment?

Continuous deployment is an extension of continuous delivery (CD) and continuous integration (CI) since it enables automation to deploy all code changes up to the production environment after changes have been merged. An Artifact is typically built, verified and deployed in a completely automated way and only a failed step will prevent bringing the changes to production.

The only difference between continuous deployment and continuous delivery is that the latter has a manual judgment to trigger production deployment (it’s not fully automated):

Continuous delivery is different from continuous deployment because it needs a manual step to deploy in production — Difference between continuous deployment and continuous delivery (source: author image)

The benefits of this approach are quite simple: since developers can deploy their changes at any time, it’s recommended to deploy the changes to production as often as possible, making troubleshooting easier and providing your customers with access to the best your product has to offer as soon as possible.

What is Spinnaker?

Spinnaker is an open-source, multi-cloud continuous delivery/deployment platform that combines a powerful and flexible pipeline management system with integrations with the major cloud providers.

In practice, you can visually define a pipeline as a node tree that starts from a root node (triggered by an event, typically a Git merge) and branches following the desired flow until it ends the execution with a positive or negative status:

A generic pipeline starts with a single node and proceeds with a tree of connected nodes. It ends with one or more final nodes. — Example of a generic pipeline structure (source: author image)

Each node of the graph, called a Stage, can be of many types such as Jenkins job, Manual judgment, Pipeline (link to another Spinnaker pipeline), Container run, WebHook, Kubernetes Deploy/Delete, and many others (here is a better list).

In addition to existing Stages, EG has defined custom stages (the advantage of an open-source solution) that make deployment simple and standard.

Composition, integration, and reuse

Since all EG applications are Dockerized and from Spinnaker’s point of view they can be built and deployed transparently, a “main” Pipeline Template has been defined to set quality and security standards for all microservices.

Therefore each service can be onboarded to Spinnaker simply by creating a new pipeline starting from a common “main” Template, which passes the specific parameters of the single microservice:

Custom pipelines can be created starting from a common template passing template variables — Pipelines can be created starting from a common template (source: author image)

The power of reuse

The above mechanism represents a huge advantage when you have dozen of services to onboard in CD. It also makes it very easy to change your own decisions, as any template changes will reflect on every pipeline of every application instantly.

It is important to specify that the individual Stages of a template can be references to other pipelines, which in turn could be template-based or not. This makes even large and complex pipelines modular and easy to manage:

It is possible to create a pipeline template for composition, calling sub-pipelines which can also be pipeline template or not. — A possible re-usable pipeline template composition (source: author image)

Each pipeline is therefore composed in a modular, flexible and standard way at the same time.

And what if my pipeline needs an additional stage beyond what a template provides? No problem. There is also a system to inject additional stages inside the graph provided by a template:

In a pipeline based on a template, you can inject additional stages to customise it — Pipeline instance that uses a Template and injects a custom Stage (source: author image)

Real example

Here is an example of a template-based pipeline, where a custom stage is injected:

{
  "appConfig": {},
  "description": "Automatic pipeline triggered if CD is enabled",
  "disabled": false,
  "exclude": [
    "triggers"
  ],
  "keepWaitingPipelines": false,
  "lastModifiedBy": "...",
  "limitConcurrent": true,
  "notifications": [],
  "parameterConfig": [],
  "schema": "v2",
  "stages": [
    {
      "completeOtherBranchesThenFail": false,
      "continuePipeline": false,
      "failPipeline": false,
      "inject": {
        "after": [
          "3"
        ],
        "first": false,
        "last": false
      },
      "judgmentInputs": [],
      "name": "Start Deployment",
      "notifications": [],
      "preconditions": [],
      "refId": "StartDeployment",
      "type": "checkPreconditions"
    }
  ],
  "template": {
    "artifactAccount": "front50ArtifactCredentials",
    "reference": "spinnaker://<template-name>:<version>",
    "type": "front50/pipelineTemplate"
  },
  "triggers": [],
  "type": "templatedPipeline",
  "variables": {
   ...
  }
}

To recap

Spinnaker pipelines can be template-based, but often they are composed of a hierarchy of pipelines which can in turn be template-based or not. Finally, we can inject single stages (also custom) into template-based pipelines. This maximizes reuse and at the same time leaves room for customization without having to abandon the concepts of standardization and control.

Progressive deployment (canary)

One of the most delicate, but also interesting, parts of the entire pipeline is certainly that of production deployment.

In this case, EG has created a custom Spinnaker stage called Progressive Deployment which deploys an application on Kubernetes following a highly secure and controlled approach. It directs production traffic to a special Kubernetes deployment variant called a canary.

How it works

The initial idea of progressive deployment (PD) is to deploy the new version of your app (called a canary) and then progressively redirect production traffic to this deployment. To judge whether the new canary version is ready to be promoted, it is compared to another dedicated baseline deployment containing the original version and receiving the same amount of traffic.

With progressive deployment the same amount of traffic is split between Canary and Baseline for a consistent comparison — Progressive deployment — canary judgment (source: author image)

NOTE: The remaining amount of traffic continues to be received by the primary deployment, which always contains the original version of the software. This is where the new version will be installed in case it is promoted.

Instant rollback

The PD stage is extremely configurable and can use different strategies to increase traffic step by step and to automatically updates workload resources with the aim of automatically scaling the workload to match demand (HPA: Horizontal Pod Autoscaling strategies). Horizontal scaling, therefore, means that the response to increased load is to deploy more pods (or remove pods in case of decreasing).

An excellent (but expensive) strategy is to always keep a sufficient number of pods on the primary deployment to support 100% of the traffic (even if during canary deployment traffic is reduced) so that, if necessary, it will be possible to make an instant rollback bringing back 100% of traffic on the primary variant.

PD instant rollback means to redirect traffic back to primary version by Kubernetes Virtual Service — PD instant rollback just acting on VirtualService routing (source: author image)

Promotion

If our review of metrics determines the canary variant is ready for production, it is deployed on the primary variant and assumes 100% traffic.

Conclusions

I hope this article can give you food for thought on a possible approach to continuous deployment through an open-source tool like Spinnaker and also on how to implement secure and reliable production deployments in Kubernetes clusters.