Continuous Delivery of HashiCorp Vault on Google Kubernetes Engine: Google Architecture

Brett Curtis
Google Cloud - Community
3 min readSep 27, 2018

This is Part 2 of a series: Index

Overview:

This is a high level view and some details on the Google resources used. Info of SLOs and failure points in the architecture as well.

Google Kubernetes Engine:

Google Kubernetes Engine provides a managed environment for deploying, managing, and scaling your containerized applications using Google infrastructure.

By creating a regional cluster, you get:

  • Resilience from single zone failure — Because your masters and application nodes are available across a region rather than a single zone, your Kubernetes cluster is still fully functional if an entire zone goes down.
  • No downtime during master upgrades — Kubernetes Engine minimizes downtime during all Kubernetes master upgrades, but with a single master, some downtime is inevitable. By using regional clusters, the control plane remains online and available, even during upgrades.

Kubernetes SLA

  • Control Plane: >=99.95% (Regional)
  • Kubernetes API: >=99.5%

Nodes use the underlying VM instances GCE SLA, which has a target SLO of 99.99% when instances are deployed across 2 or more zones in the same region.

Google Cloud Storage:

Cloud Storage allows world-wide storage and retrieval of any amount of data at any time. You can use Cloud Storage for a range of scenarios including serving website content, storing data for archival and disaster recovery, or distributing large data objects to users via direct download.

Cloud Storage SLA

  • Multi-Regional: >= 99.95%

Google Cloud KMS:

Cloud KMS allows you to keep encryption keys in one central cloud service, for direct use by other cloud resources and applications.

Cloud KMS SLA

  • KMS: >=99.5%

Google Architecture Diagram:

Google SLO:

Note: This is a estimate and the overall SLO is actually higher, see KMS info below.

Failure Points:

This does not included any underlying infrastructure that is already applied to the product SLOs.

Google Cloud KMS

This is a startup run-time dependency only. Vault needs to initialize and unseal in order to create master keys and decrypt data.

  • Initialization is the process by which Vault’s storage back-end is prepared to receive data. Since Vault server’s share the same storage back-end in HA mode, you only need to initialize one Vault to initialize the storage back-end.
  • Unsealing is the process of constructing the master key necessary to read the decryption key to decrypt the data, allowing access to the Vault.

If Google Cloud KMS is down and all of our Vault servers go down, Vault will not be able to initialize or unseal and applications will not be able to retrieve any data from Vault during startup or auto-scaling of instances.

Google Cloud Storage

Multi Region Cloud Storage is used for the Vault back-end.

If Google Cloud Storage is down Vault will be unavailable.

Google Kubernetes Engine

A single regional Kubernetes cluster will be used to start — with the ability to scale to multi region later on if needed.

Vault can only run in multi region with enterprise version.

A min node-pool size of one per zone / max 3. POD replica size 3.

  • Loss of nodes and pods will result in little to no service disruption as long as at least one is up.
  • Loss of a zone will result in little to no service disruption as long as at least one zone is up.

During a regional failure Vault will be unavailable.

Notes:

Applications that depend on Vault may want to fail startup of a service if it cannot connect.
Here is a Spring example:
Config Client Fail Fast

Part 3 ->

--

--

Brett Curtis
Google Cloud - Community

I drink coffee and do things with cloud infrastructure..