HashiCorp Consul: Multi-Cloud and Multi-Platform Service Mesh

Andrew Klaas
HashiCorp Solutions Engineering Blog
9 min readJan 3, 2020

Connect and secure services across heterogeneous environments

Introduction

Recently, service mesh technologies have become a popular topic in the infrastructure and cloud space. There are several available, each with their own strengths and weaknesses.

Consul (which includes Consul Connect) constitutes HashiCorp’s service mesh solution for networking. Also, I will be leveraging Envoy as the service mesh side-car proxy used by Connect in today’s post.

What does the Consul Service Mesh do?

Multi-platform/Multi-cloud Service Discovery

  • Perhaps the most critical part of a service mesh is determining where in the world services are! Consul can be easily deployed on any cloud and on several platforms (VM, Container, baremetal, etc.). This flexibility removes the need for service discovery tool sprawl. Here is a fantastic video that dives more into the multi-platform service discovery problem.

Security

  • Consul Connect helps adopt a zero trust networking model by enforcing mutually authenticated TLS connections between services authorized by identity based policies. These secure connections can span multiple datacenters and platforms to accommodate today’s complex infrastructures.
  • Consul Intentions define access control for services via Connect and are used to control which services may establish connections. Intentions can be managed via the API, CLI, or UI.
  • Intentions are enforced by the proxy or natively integrated application on inbound connections.
  • Consul ACLs are used to secure CLI, GUI, and REST API access and communication for users and services.

Telemetry and Observability

  • Consul and Envoy provide access to a wide range of metrics and logs for traffic monitoring and tracing.

Consul Load-balancing and Network Middleware Automation

  • Consul automatically load balances network connections across healthy instances of services.
  • Consul also enables a publisher-subscriber model where services register themselves and their network location upon deploy. Middleware can then subscribe to these service changes and trigger dynamic re-configuration. (F5, Nginx, Apache, etc.)

Health Checking and High Availability

  • Consul agents improve visibility and resilience by performing customizable health checks on their local services to remove unhealthy nodes from the service discovery layer and to provide an up-to-date health status of all services in your infrastructure.

Traffic Management

  • Consul can split layer-7 traffic, using envoy proxies configured with Consul Connect.
  • Upgrade patterns like zero-downtime, blue-green, and canary deployments can be performed via Consul as well.

Key/Value store

  • You can also store configuration in Consul’s key/value store. This data can be used at runtime to dynamically update services via consul-template, watches, or edge triggers.

Why Service Mesh?

One of the primary bottlenecks in deploying applications is often network changes such as DNS or firewalls. The entire DevOps/application delivery pipeline is then hamstrung by these changes. For example, in many traditional enterprise shops, the ITIL process including CAB, mistakes in IP addresses, filling out forms, someone going on vacation, etc. can cause network changes to take weeks or even months for completion.

A multi-cloud and multi-platform service mesh helps abstract away many of these problems by creating a consistent layer enterprise wide. Developers no longer need to worry about east-west load balancing, updating DNS, service failover, or most importantly encryption and authorization. The Consul service mesh provides these features automatically through the Connect feature and envoy proxies and removes the need for manual processes.

Why Consul?

As a field Engineer (at HashiCorp), I’ve worked with a large number of global 2000 companies and have seen a plethora of technologies (VMs, containers, mainframes, serverless, etc.), platforms (PCF, K8s, Openstack, etc), and cloud providers. The idea of being multi-cloud and multi-platform is here to stay as large companies undergo situations like mergers and acquisitions. The point being: we must consider our entire organization and its technologies to pick a proper service mesh solution.

A few questions that may lead you towards a Consul solution:

  • How do we improve application uptime and resiliency?
  • We’re adding Kubernetes: how do we make applications aware of non-containerized or non-Kubernetes services?
  • Are our stateful databases outside of Kubernetes?
  • How do we automate our north-south load balancers and firewalls to keep up with short-lived or constantly changing services?
  • How do we transparently secure traffic between our VMs, Kubernetes containers, databases, and cloud services?

Multi-Platform Demo

We will use code from the following Github Repository.

https://github.com/Andrew-Klaas/hashi-k8s-multi-platform-connect-demo

Follow along here or in the Readme to deploy our hybrid K8s / VM environment.

In this demo, we will use Terraform to deploy Consul and Vault via the official helm charts. In tandem, we will also deploy a cloud native virtual machine (Google Cloud Platform) with MariaDB installed. The next step will be to WAN join these separate “datacenters” so services within Kubernetes and outside it can find one another via service discovery.

Once the infrastructure is all set up, we will then deploy an example python application that encrypts sensitive customer information leveraging Vault. The encrypted information will then be stored in the MariaDB database outside of kubernetes.

This whole process will leverage Connect to enforce mTLS between services and to authenticate service-to-service connectivity (aka: is there a policy that allows the web facing application to talk with the database?).

So, let’s get started…

Note the software requirements needed on your local laptop. You can also use the included Vagrantfile.

git curl jq kubectl(v1.11 or greater) helm(v2.14.3 or greater) consul vault

Work through the setup:

https://github.com/Andrew-Klaas/hashi-k8s-multi-platform-connect-demo/blob/master/README.md#setup

Once finished with the entire README, proceed to the next section to understand how it all works together.

Technical Walkthrough

Infrastructure Diagram

The following diagram outlines the architectural parts of the demo. We will walk through the actual request flow further below.

Consul Helm

We have deployed Consul via the official Consul helm chart.

Note: Do not run this example in production. Please consult the following for best practices:

The important information for customizing the Consul Helm Chart is in the “values.yaml” file.

Most notably, Consul Mesh Gateways (configuration) allow us to leverage Connect across multiple Consul datacenters. These gateways can operate in different clouds (Azure, AWS, GCP, on-premise, etc.) and platforms (VMs, K8s, etc.). The best part: all data is encrypted over mTLS automatically. Developers do not need to worry about or manage encryption themselves and can leave that to the service mesh.

# Mesh Gateways enable Consul Connect to work across 
# Consul datacenters.
meshGateway:
# If mesh gateways are enabled, a Deployment will be created that
# runs
# gateways and Consul Connect will be configured to use gateways.
# See https://www.consul.io/docs/connect/mesh_gateway.html
# Requirements: consul >= 1.6.0 and consul-k8s >= 0.9.0 if using
# global.bootstrapACLs.
enabled: true

Connect must also be enabled.

The virtual machine in datacenter 2 will deploy the Consul agent (in server mode) and the mesh gateway on that same node for simplicity. This is not typical in production where they are deployed separately.

Vault Helm

Similarly, we will use the official Vault helm chart to deploy Vault. Documentation is here.

Note: Do not run this example in production. Please consult the following for best practices:

Some points of interest are the values.yaml file and the vault setup (mostly for configuring the Vault transit and database secret engines for our python application as well as for creating a Vault user).

HashiCorp has also recently added a method for auto-injecting secrets into applications via mutating admission web-hooks. This is a great method for integrating legacy applications with Kubernetes and Vault. See this blog post for more info.

The Python Application:

Out demo application is a simple web app that encrypts customer records and stores them in an SQL database. It is not the focus of the demo, but is interesting nonetheless as it leverages Vault’s Transit Secrets Engine, which functions as Vault’s encryption-as-a-service. This provides a key management service for applications to encrypt and decrypt their data. So if an attacker were to view a database’s contents using a privileged credential, they would only see cipher-text in the database. This can be seen in the “Records” and “Database View” tabs of the application after encrypting a new record.

Example python application code for interacting with Vault:

def encrypt(self, value):
try:
response = self.vault_client.secrets.transit.encrypt_data(
mount_point = self.mount_point,
name = self.key_name,
plaintext = base64.b64encode(value.encode()).decode('ascii')
)
logger.debug('Response: {}'.format(response))
return response['data']['ciphertext']
except Exception as e:
logger.error('There was an error encrypting the data: {}'.format(e))

Another security feature showcased is the MariaDB database secrets engine. This feature enables Vault to create short-lived and unique credentials for applications. So each instance of an application has its own unique MariaDB username/password that is valid for say 48 hours or 1 week. You can customize this TTL (time-to-live) to your liking. These credentials can be renewed, revoked, rotated, etc. and create a moving window for attackers to target. Also, this workflow greatly improves credential auditing as we no longer need to share a credential between instances of an application. We wouldn’t share AD usernames/passwords for humans, so why would we do it for machines and services?

Why We Need Dynamic Secrets

Application Code

K8s deployment files

Connect Walkthrough

Let’s walk through how Connect works in more detail.

Step 1 (Refer to above diagram):

As our Python application starts up, it first pulls unique short-lived MariaDB credentials from Vault. It then connects to the database. However, it does this by connecting to its local connect-envoy proxy instead of directly to the destination. Note: all traffic through the proxy is encrypted.

K8s config map code

Pod Logs:

$ kubectl get pods
$ kubectl logs k8s-transit-app-7f54c77669-2rlk4 k8s-transit-app
2019-12-17 21:30:50,901 - DEBUG - urllib3.connectionpool - _make_request - http://127.0.0.1:8200 "GET /v1/lob_a/workshop/database/creds/workshop-app HTTP/1.1" 200 309
2019-12-17 21:30:50,902 - INFO - db_client - vault_db_auth - Retrieved username v-kubernetes-workshop-a-zqMs9WzS and password A1a-hfH61skqIEcTV3EC from Vault.2019-12-17 21:30:50,902 - DEBUG - db_client - connect_db - Connecting to 127.0.0.1 with username v-kubernetes-workshop-a-zqMs9WzS and password A1a-hfH61skqIEcTV3EC

The Consul Helm chart enabled side-car injection into the deployment via K8s annotations. Specifying multiple upstreams creates multiple listeners in envoy.

Deployment spec:

. . . 
template:
metadata:
name: k8s-transit-app
labels:
app: k8s-transit-app
annotations:
"consul.hashicorp.com/connect-inject": "true"
"consul.hashicorp.com/connect-service-upstreams": "vault:8200,mariadb:3306:dc2"
. . .

The result makes it easy to integrate applications with Connect. All that was needed was to point them to their local proxy.

Step 2:

In this step, Consul simply resolves a service destination. Since we WAN joined our Consul clusters earlier, Consul knows to resolve the database location to the local mesh gateway for forwarding to another Consul “datacenter” (or separate platform). The deployment specified that any connections to localhost on the application pod will resolve to MariaDB in datacenter 2.

“mariadb:3306:dc2”

Mesh gateway details are described in detail here.

Step 3:

The envoy proxy forwards the request to its local mesh gateway. This is configurable. We could have forwarded directly to the remote gateway or destination service as well.

Step 4:

Next, the request is proxied between the source gateway and destination gateway over mutual TLS. The gateways operate by sniffing SNI headers of the Consul sessions and then routing to the correct destination based on the service name requested (service and datacenter). They do not decrypt or see decrypted data at any point.

Step 5:

The destination mesh gateway maps the received SNI header against its dynamically updated list of configured (local datacenter) services. It will then determine a healthy service instance and associated envoy sidecar proxy IP/port to route to.

Step 6:

Now, The MariaDB proxy will check with Consul to determine if there is an intention (policy) that authorizes the “k8s-transit-app” to communicate with the destination MariaDB service.

Consul intentions

Example: (the python app can connect to Vault to retrieve DB credentials)

IMPORTANT: We can further enhance security by restricting database connections in the MariaDB configuration to only the localhost. So all remote connections will be rejected unless they are through the Connect Envoy proxy that is checking certificates and intentions (service to service authorization).

Step 7:

We’ve come full circle as certificates and intentions have passed inspection. Now, a fully authorized end-to-end TLS connection is established.

Next Steps:

Play around with the application and try encrypting some records. After adding a record, see the “Database View” tab. This shows the actual encrypted contents of the database. If an attacker is able to steal database credentials, they must still gain access to Vault and the transit secret engine to decrypt the cipher-text!

Summary:

As multi-cloud architectures mature, engineers will need to find service mesh solutions that are flexible enough for their organization’s new and legacy technologies. Consider Consul as a solution for:

  • Multi-cloud and platform service discovery
  • Automatic load balancing
  • Automation of middleware
  • Telemetry and observability into the network
  • Automatic encryption of all traffic
  • Securing authorized connections across the organization wide service mesh.

Further Reading

Check out the HashiCorp Blog or Hashicorp.com for more info on Consul, Vault, Terraform, Nomad and more.

Also check out the HashiCorp Solutions Engineering Blog for more HashiCorp technical content.

--

--