A service mesh is a useful tool for connecting microservices. A good service mesh performs the following:
- Keep a live view of what service is where, verified with health checks
- Enforce encrypted communication between services
- Load balance connections between all healthy instances, or between a subset of instances as per a routing or splitting policy
- Allow only authorized services to communicate
Typically when we think about microservices we think of Kubernetes as the scheduler of choice for large scale containerized applications. For this reason most service mesh solutions are Kubernetes only. But what if my microservices application is on a different deployment platform? I could have .NET or Java applications which are loosely coupled services running on self-hosted VMs or cloud compute. I could have a large self-managed docker environment or applications scheduled with Nomad. Or I might be running in Amazon’s Elastic Container Service (ECS). Enter Consul — a great solution for service mesh across Kubernetes and many different operating environments.
This blog will look at a Consul service mesh pattern for applications in ECS. This example is running on EC2 instances under an ECS managed cluster, but could be easily modified to run Fargate workloads as well.
Before we begin
This blog is not an introduction to either Consul or ECS. To understand more about Consul:
- Take a look at the Consul website and documentation.
- Watch this Consul introduction from HashiCorp co-founder and CTO Armon Dadgar.
- Read or watch one of the many blogs and case studies covering Consul production deployments.
For more on ECS please visit the AWS documentation.
Consul service mesh on ECS —Demo environment
Follow along at home
To demonstrate this example I have created a repository to help you get started: https://github.com/dgkirkwood/consul-ecs-awsvpc
To skip the long read and get straight to your environment, visit the Github link, and follow the readme instructions. As long as you have an AWS account and can follow basic Terraform instructions, you will have an ECS + Consul Service Mesh environment in a few minutes.
** Disclaimer! ** : This is a repo to be used purely for testing and demonstration purposes and should not be used for production.
Understanding the environment
The code in the linked repository creates an AWS environment using Terraform. Follow the readme file for how to get started. Note I will not go into any detail on the Terraform code in this blog unless it relates directly to the ECS or Consul configuration. The image below shows some of the resources created:
A simple network consisting of a single VPC with a single public subnet supports this environment. The network.tf code also creates a single interface in this subnet per EC2 server, providing a static IP address to simplify the demo environment. Our security groups also attach to this interface.
This environment consists of three EC2 servers. You can find the Terraform definition for these servers in the main.tf file.
consul-ec2–1: Ubuntu AMI which runs the Consul binary in server mode. This is a single server that would expand into a three or five node cluster for a production environment.
ecs-ec2–1 & ecs-ec2–2: An ECS-optimised AMI which becomes managed by the ECS service.
The Terraform code creates a hierarchy of ECS objects for the demo environment. Our application consists of two simple services: An HTTP server and a HTTP client. Each of these services has its own task, container and service definitions that you can find in the ecs.tf file.
ECS Cluster: Top-level logical mapping of servers to tasks. The cluster name for our environment is populated by one of the required variables in the repository.
ECS Task Definition: A resource defining the connection between a container definition, the network type, and the available storage for those containers.
Note that the network type chosen for our tasks is AWS VPC. This will create an Elastic Network Interface (ENI) per task which will be assigned a dynamic IP address from your subnet. This is the same network type applied to Fargate tasks, and this code with a few modifications could apply to Fargate workloads as well.
Container Definition: This is a JSON file containing the defined resources for containers that will run inside your task definition. The container definition for one of our tasks is shown here.
Note that for both of the services in this example, we have defined three containers.
- A container for the application itself (named ‘server’ in the code sample next to this text).
- A container to run the Consul agent. This agent is a service local to the application and the Envoy proxy to perform service query resolution, health checking, and some control plane tasks for our service mesh. Note that an agent per task-definition is not the only pattern for a service mesh you could for instance run an agent per EC2 host. The following ports are exposed:
- 8500: Consul HTTP API
- 8502: Consul gRPC for xDS communication with the Envoy proxy
- 8301: LAN Serf traffic for Consul node communication
- A container to run the Envoy proxy. This is the service that performs all the data plane operations for our service mesh. All traffic will be forwarded via the proxy which has a certificate to be able to communicate with other proxies that are part of the same mesh. The proxy can also enforce service intentions (which we will cover later) and perform layer 7 tasks such as service splitting and routing. This container is using an image from one of my colleagues which includes the Consul agent, and it is this container that will register the application service to Consul. Note the Environment variables provided pointing Envoy to the Consul agent running on localhost.
Port 21000 is exposed for the Envoy container. This is the first port in a range 21000–21255 from which ports are picked for the public listener of the proxy. This can be statically defined as part of the Consul configuration. As we are using a new ENI for each ECS service, we can stick with 21000 only.
Both the Consul agent and the Envoy containers are using volume mounts from the EC2 instance to provide configuration files that are generated as part of the Terraform run.
ECS Service: A map to link together the task definition, cluster, subnet, and security groups. At the service level is also where the number of instances of a task is required. Note that the code will create 3 instances of the server app and 1 instance of the client app by default. You can change these numbers but may hit the limit of ENIs available per EC2 instance.
There are also security groups and IAM policies defined as part of the test environment. These are policies to allow the EC2 instances to perform actions as part of the ECS cluster, as well as connectivity between the Consul server and agents, and finally connectivity policies to allow HTTP/S and SSH access into the servers. These security controls would be refined and tightened for a production environment.
Testing the service mesh
Step 1: Check Consul
To see the service mesh in action, follow the instructions at the Github repository to instantiate the AWS infrastructure with Terraform. Once you have your Consul GUI address, paste the link into your browser and take a look.
We can see that two services have registered that correspond to the two tasks defined in ECS. Consul also registers itself as a service.
Clicking on the http-server service, we can see there are three instances of the service and we can see the IP address and port for the service. The green checks show that this service is passing two types of health check. Click on one of the instances for more detail.
Here we can see more detail on the health checks for our service. We have a service check that was defined in the configuration file used to register the application (take a look in consul-ecs-aws/configs/server.hcl). We also have a health check running on the Consul agent itself. Serf is the protocol used for communication between Consul nodes.
Clicking on the Proxy Info menu item for the http-server instance shows that our Envoy proxy is up and passing health checks as well.
Step 2: Test the application
Testing our application using the service mesh involves running a simple curl command through the client container.
- Connect to your ECS hosts using SSH. Replace xx.xx.xx.xx with the IP address given by the Terraform output for the ECS servers.
ssh -i /path/to/your/privatekey email@example.com
2. Inspect the containers running on the host
3. You are looking for a container running the image tutum/curl:latest and it will have a name similar to ‘ecs-serviceMeshAppClient-25-client-e28695a8def7ddc91e00’. Note there is only one client container by default, so you may need to check both ECS hosts. Once you find this container, execute a curl from the container using the following command
docker exec -it <replace-me-with-client-container-name> curl 127.0.0.1:8085
4. You should see the following response:
What in the mesh is happening?
The image above shows what is happening for our client application to receive a “Hello world” response from the server. You would have noticed we dialed 127.0.0.1:8085 to reach out to the server. This may seem illogical from a networking point of view. Why not reach out to the server directly on port 80? This is one of the great security benefits of the service mesh. Our server is no longer exposing port 80 or port 443 out to the network, because in our mesh all communications must be authorized and must run over sessions encrypted with mTLS. The only port that is exposed is port 21000 belonging to our Envoy proxy. That proxy will only accept connections from other proxies which present a certificate that has automatically been provisioned by Consul as part of the mesh bootstrapping. Do not underestimate the huge uplift this brings to your microservices architecture with relatively little effort!
Let’s take a look at the configuration on the client task to allow our application to reach out to the server.
Notice the “upstreams” definition as part of the proxy configuration. Here we are defining which services we expect the client to reach out to (http-server) and assigning an arbitrary port for the client application to send traffic to locally. Our Envoy proxy will then expose this port on localhost to accept traffic from the application.
For more on traffic patterns in a service mesh, take a look at the HashiCorp whitepaper “The Life of a Packet through Consul Service Mesh” which gives a detailed summary of packet flows in various mesh scenarios.
Step 3: Test authorization
We have improved our security posture for this application by only allowing mesh capable services of communicating. Our application does not need any refactoring and is blissfully unaware of the keys, certificates, and libraries that would usually be required to achieve the same result. However we can improve this further.
In a typical microservices environment you would be connecting tens to hundreds of different services. If these are all part of the same mesh, should they all be allowed to communicate? If we are following the principles of zero trust, then no. Consul provides a mechanism to implement service to service authorization controls.
- Click on the Intentions top menu item, then click the blue Create button
2. Select http-client as the source and http-server as the destination. Leave the radio button on Deny and click Save.
3. Your intention is now shown and you are ready to test. Run the same curl command from Step 2 part 3. (Making sure you are in the ECS SSH session)
docker exec -it <replace-me-with-client-container-name> curl 127.0.0.1:8085
You will now no longer see the “hello world” message, instead you should see:
curl: (52) Empty reply from server
Consul has enforced our intention via the Envoy proxy in the data plane. Note that intentions are always enforced on the destination proxy. You can play with creating and removing these intentions to get a feel for service control in Consul
Here I have presented a pattern that could be used to insert a service mesh into your ECS or Fargate environment on AWS. By doing so you get the benefits of a greatly increased security posture with no change to the application code. Our Consul service mesh will track healthy applications and spread requests between them, encrypt all connections, and ensure only authorized applications can call each other over the network.
The great thing about this mesh is that it can be extended into other environments to keep this security posture maintained as your deployment platforms change. Services that cannot be paired with a proxy such as a managed database can be brought into the mesh with terminating gateways. The same mesh can be extended into a new Kubernetes environment via a mesh gateway, and traffic from outside the mesh can be securely introduced via ingress gateways. As with all HashiCorp products, getting started with Consul is easy. Download it yourself and start learning!
Thanks for reading and see you next time!