Managing Your EKS Traffic With App Mesh

Photo by Ricardo Gomez Angel on Unsplash
Photo by Ricardo Gomez Angel on Unsplash

AWS App Mesh is a managed service mesh from AWS. While announced at re:Invent 2018, it only became generally available at the end of March. In this post, I aim to give an overview of the service and how it works with EKS. I’ll also highlight some differences with Istio and give a step-by-step walkthrough to make it work with an application.

What is App Mesh?

As mentioned, AWS App Mesh is a service mesh. In turn, a service mesh is a way to control and monitor traffic between your microservices. Depending on the exact service mesh you use it may have more capabilities, but these two items are the core of a service mesh and all that App Mesh allows you to do. App Mesh enables you to do this for microservices running on multiple AWS services, which includes Fargate, EKS, and even EC2 instances. It also integrates with CloudWatch and X-Ray. For this post, we’re limiting ourselves to the EKS/Kubernetes integration, but the other services work similarly.

All of the above is nice, but it doesn’t tell us what it actually does. When running App Mesh with Kubernetes, the first thing you need to be aware of is that you need to have a lightweight proxy sideloaded into your pods.

The above image, based on Istio’s bookinfo example, shows that if you initially only have a single container in your pod you will add the Envoy proxy. From that point on, the proxy will handle all of the traffic to and from the containers in your pod. The proxies then talk to App Mesh (or rather, they look at the configuration that is pushed to them by App Mesh) to discover where they need to send traffic.

So far this might not seem very useful, but instead only adds an extra layer of complexity. However, this gives us more fine-grained control over the traffic. For example, when a new version of our details microservice is released, we can start by only directing 10% of the traffic there, while the remaining traffic keeps going to the original version.

As the Envoy proxies manage all the traffic we can use this for other things as well. Such as collecting all the traffic logs for CloudWatch or enabling tracing with X-Ray.

How do we do this?

We now know what App Mesh does, but we still need to know how to make it work with the applications in our cluster. In App Mesh several concepts are essential: Virtual Services, Virtual Nodes, Virtual Routers, and routes. I’ll discuss these in turn.

A Virtual Service is a representation in App Mesh of an actual Kubernetes service. When your application makes a call to a different microservice, this is what it will call out to. So you will give these the same name as a service you have running in your cluster. Continuing with the above example, the details Virtual Service maps to the details service.

A Virtual Node represents the combination of a deployment and a service. This may sound a bit strange as we already have the service mapped to a Virtual Service, but because a Virtual Node requires a DNS endpoint just mapping it to a deployment is not enough. Keep in mind that this means you need to have a separate service for each of your deployments.

The name of the Virtual Node also needs to match an environment variable in the Envoy proxy so that it can identify itself. There are several other things that you manage in the Virtual Node, such as which services it can reach, but we’ll discuss those with a practical example.

The Virtual Router and Routes in comparison don’t map to anything inside your Kubernetes environment. Instead, they create the connections between the Virtual Services and Virtual Nodes. You attach one or more Virtual Routers to your Virtual Service and then configure the routes to point at one or more Virtual Nodes. Route selection is currently limited to path prefix only, so you can set up routes for all traffic coming on / or /details, but you can't filter it in a different way yet.

Each of these has more configuration options than I just mentioned. For example, instead of attaching a route to a service you can directly attach a Virtual Node. And to allow a Virtual Node to call a Virtual Service, you will need to explicitly add the Virtual Service as a backend for that Virtual Node.

App Mesh vs Istio

When it comes to service meshes in Kubernetes, Istio is the best-known solution and therefore the most useful comparison for App Mesh. It’s also one that you might be familiar with already, in which case seeing the differences will help in determining the value of App Mesh. Before going into specific differences, however, let’s take note that both solutions have a different purpose and implementation.

Istio is a more heavy-weight solution that is not limited to traffic management, and has a big focus on other aspects like security and observability, with one of its core principles being portability. Whereas App Mesh is designed to be used purely for traffic management of services running in AWS. App Mesh is also designed to easily allow these multiple services (Kubernetes on AWS, ECS, and EC2) to run together in a single Mesh.

In addition, Istio runs entirely on your cluster with its control plane using resources in your setup, while as a hosted solution the only parts of App Mesh that use resources are the Envoy sidecars. All in all, that means they are two very different approaches to a similar problem and you probably already have a pretty good idea which one is more suitable for your environment.

But let’s look at some specific differences in the traffic management functionality. To start with, in the previous section I highlighted that App Mesh only supports path based routing rules. Istio, on the other hand, supports more options, including routing based on HTTP headers similar to what Application Load Balancers in AWS are now capable of. The roadmap for App Mesh shows that these capabilities are planned, but they didn’t make it for this first version.

The way that access is granted to internal services is different as well. By default, Istio allows access to the services from any pod in the mesh, but you can enable RBAC and ACL controls that allow extensive authorization settings including differentiation between methods (GET vs POST for example). As mentioned before, App Mesh disallows access by default so you always need to explicitly grant a Virtual Node access to a Virtual Service, but the controls are limited to allowing or disallowing access to the entire service regardless of method.

Another implementation difference is in the way you can set up your routing. With App Mesh you need to create a separate service for different versions, whereas with Istio you can set up a DestinationRule that allows you to define subsets you can reference from your routes. From a building perspective, the two approaches are fairly similar except for where you define the filters.

While both App Mesh and Istio have support for active health checks that ensure unhealthy members of a service are taken out, Istio goes a bit further with more advanced support for various failure recovery features.

The other big difference, however, is in the logging. With Istio you can get the full logging and monitoring experience out of the box, including dashboards. App Mesh allows you to configure the Envoy proxy’s logging location, but afterwards you still need to run an agent that will send these logs somewhere. However, App Mesh also allows easy integration with X-Ray, AWS’ tracing service that lets you follow the path of a request across many services.

Lastly, I want to point out one of the differences that is related to security. In Istio you can configure the mesh to use mutual TLS, which allows you to ensure all internal service requests are encrypted. App Mesh doesn’t support this yet. The roadmap contains ideas about how this will be implemented, but it’s still in early stages.

App Mesh in Action

Differences aside, what we really want to know is how well App Mesh works, let’s try to run it with the standard Istio example/demo bookinfo. If you’re not familiar with it, bookinfo is a small set of microservices where a single product page calls a couple of backend services to display information. It also easily allows you to test routing changes by having different ways of reviews being displayed. The below diagram shows the architecture of the original bookinfo.

However, as mentioned earlier we need to make a small change to ensure we can use it the same way with App Mesh. The only real change we need is ensuring that every deployment has its own service, which is shown in this updated diagram.

Setting up the groundwork

Let’s start with ensuring we have everything we need, and feel free to follow along. The first thing we need is an EKS cluster; for this I will use eksctl which allows us to spin up a new cluster with a single command very easily:

$ eksctl create cluster -f demo-cluster.yml

Ok, I admit, that was a bit of cheating because I prefer to use configuration files instead of long CLI commands, so let’s look at what demo-cluster.yml actually contains.

apiVersion: eksctl.io/v1alpha4
kind: ClusterConfig

metadata:
name: demo-cluster
region: us-east-1
version: "1.12"

nodeGroups:
- name: ng-1
instanceType: t3.medium
desiredCapacity: 2
minSize: 2
maxSize: 5
iam:
withAddonPolicies:
appMesh: true

This should all be reasonably clear, but I want you to focus on the last line: appMesh: true. This will automatically grant the node group access to App Mesh, a fairly obvious requirement.

Now that we’ve got our shiny new cluster, it’s time to make life a bit easier. Adding all of those Envoy containers to my pods is a lot of work, and would require many changes to the bookinfo example, which we don't want. Luckily, while it isn't mentioned in the App Mesh documentation, there is an early stage sidecar injector that adds these automatically for you. Let's run the installation for this as shown in the README.

Please remember to check the code before you run a bash script directly from GitHub.
$ export MESH_NAME=bookinfo-mesh
$ curl https://raw.githubusercontent.com/aws/aws-app-mesh-inject/master/hack/install.sh | bash

Please note that this means that from now on, the injector will always try to link to the bookinfo-mesh App Mesh. If you want to use a different one, you will need to override that in your Deployment spec.

The last thing we’ll do is to create a namespace to run the demo in and enable the injector for that.

$ kubectl apply -f bookinfo-appmesh-ns-only.yml
---
apiVersion: v1
kind: Namespace
metadata:
labels:
appmesh.k8s.aws/sidecarInjectorWebhook: enabled
name: appmesh-bookinfo

And we’re good to go!

Building the Application

Because we use the injector to do all the hard work for us, there aren’t many changes to the bookinfo application itself. As we're running the application in a namespace, I've included that in the metadata, and we need to add those additional services for the reviews, but that's it.

apiVersion: v1
kind: Service
metadata:
name: reviews-v3
namespace: appmesh-bookinfo
labels:
app: reviews
service: reviews
spec:
ports:
- port: 9080
name: http
selector:
app: reviews
version: v3

All in all, a relatively minor change so let’s focus instead on the new things. If you want to see all of the changes, this template and everything else shown here is available in the mantel-digio/bookinfo-appmesh GitHub repo.

Building the Mesh

As usual with AWS, there are many ways to build an App Mesh: through the Console, using the CLI, with CloudFormation, or other tools like Terraform. There is even a controller that lets you do it through Kubernetes, but that is still in an early stage.

To mix things up a bit, I’ll go with CloudFormation here, but the repo also contains the controller example. Please note that the CloudFormation syntax is a bit verbose, so I only include the minimum requirements in this example and once again limit myself to the productpage. The entire template, including health checks, is available for your perusal though.

Resources:
# ProductPage microservice
ProductPageRouter:
DependsOn: Mesh
Type: AWS::AppMesh::VirtualRouter
Properties:
MeshName: !Ref MeshName
VirtualRouterName: productpage-router-appmesh-bookinfo
Spec:
Listeners:
- PortMapping:
Port: 9080
Protocol: http

The first thing we need is the Virtual Router. As you can see, it doesn’t actually need a lot of information. You provide it with the name of the mesh to connect to, the name you wish to give to the router, and the port it should be listening to. Be aware though that if you define the mesh in the same CloudFormation template, you need to add a DependsOn as there is no internal reference to it.

ProductPageService:
Type: AWS::AppMesh::VirtualService
Properties:
MeshName: !Ref MeshName
Spec:
Provider:
VirtualRouter:
VirtualRouterName: !GetAtt ProductPageRouter.VirtualRouterName
VirtualServiceName: productpage

The Virtual Service is the same story except we include the name of the Virtual Router. As we’re doing this using a !GetAtt the dependency order is already determined, so we don't need DependsOn.

ProductPageNodeV1:
Type: AWS::AppMesh::VirtualNode
Properties:
MeshName: !Ref MeshName
VirtualNodeName: productpage-v1-appmesh-bookinfo
Spec:
Backends:
- VirtualService:
VirtualServiceName: !GetAtt DetailsService.VirtualServiceName
- VirtualService:
VirtualServiceName: !GetAtt ReviewsService.VirtualServiceName
- VirtualService:
VirtualServiceName: !GetAtt RatingsService.VirtualServiceName
Listeners:
- PortMapping:
Port: 9080
Protocol: http
ServiceDiscovery:
DNS:
Hostname: productpage.appmesh-bookinfo.svc.cluster.local

Now we’re getting to the interesting part. For the Virtual Node, we need several things; first, we need to make sure that the VirtualNodeName matches what was injected into the Envoy container of the deployment. The injector uses deploymentname-namespace for this, so that's what we use. A similar situation is the case for the Hostname in the ServiceDiscovery part, this needs to match an existing service name.

The Backends are the services that this microservice has access to. If we don't define this, there is no way to access those endpoints as I'll demonstrate a bit later on.

ProductPageRouteV1:
Type: AWS::AppMesh::Route
Properties:
MeshName: !Ref MeshName
RouteName: product-v1-route
VirtualRouterName: !GetAtt ProductPageRouterV1.VirtualRouterName
Spec:
HttpRoute:
Match:
Prefix: "/"
Action:
WeightedTargets:
- VirtualNode: !GetAtt ProductPageNodeV1.VirtualNodeName
Weight: 1

The route is the only remaining item here, but it’s where we can determine what goes where. Unfortunately right now we can only do path based routing, but hopefully that will be improved on soon enough.

Deploying it all

The deployments are easy, but you need to ensure you do it in the right order. If there is no App Mesh for the Envoy proxies to reach when they start up, they will not know what to do. So, we first need to install the mesh and then deploy the application.

$ aws cloudformation deploy --template-file bookinfo-appmesh-mesh-cfn.yml --stack-name bookinfo-appmesh
$ kubectl apply -f bookinfo-appmesh-app.yml

As there is no gateway included, we’ll use a port forward to show us the application.

$ kubectl port-forward -n appmesh-bookinfo svc/productpage 8000:9080

Now we can open our browser to localhost:8000/productpage and see a working bookinfo example.

Testing the routing

Ok, that’s nice. But how do we know that it actually uses the App Mesh? If we had configured logging we could look there, but regardless it’s much more fun to play around with what we built and see how that changes things! Which we can do by only changing the CloudFormation template.

First, let’s see what happens if we change the ProductPageNodeV1 by commenting out its access to the details service.

ProductPageNodeV1:
Type: AWS::AppMesh::VirtualNode
Properties:
MeshName: !Ref MeshName
VirtualNodeName: productpage-v1-appmesh-bookinfo
Spec:
Backends:
# - VirtualService:
# VirtualServiceName: !GetAtt DetailsService.VirtualServiceName
- VirtualService:
VirtualServiceName: !GetAtt ReviewsService.VirtualServiceName
- VirtualService:
VirtualServiceName: !GetAtt RatingsService.VirtualServiceName
Listeners:
- PortMapping:
Port: 9080
Protocol: http
ServiceDiscovery:
DNS:
Hostname: productpage.appmesh-bookinfo.svc.cluster.local

And deploy that using

$ aws cloudformation deploy --template-file bookinfo-appmesh-mesh-cfn.yml --stack-name bookinfo-appmesh

Once deployed it shouldn’t take long until we no longer have access and the application will throw an error.

Great! Now, let’s reverse that and update the reviews service to use v2 and v3 instead of the v1 that we initially set up.

ReviewsRoute:
Type: AWS::AppMesh::Route
Properties:
MeshName: !Ref MeshName
RouteName: reviews-route
VirtualRouterName: !GetAtt ReviewsRouter.VirtualRouterName
Spec:
HttpRoute:
Match:
Prefix: "/"
Action:
WeightedTargets:
- VirtualNode: !GetAtt ReviewsNodeV2.VirtualNodeName
Weight: 1
- VirtualNode: !GetAtt ReviewsNodeV3.VirtualNodeName
Weight: 2

After deploying this change, you will see a rating under the reviews with black (v2) or red (v3) stars.

Once again, without App Mesh supporting more than path-based routes we can’t run some of the more advanced Istio examples so I’ll leave it at this. In the meantime, feel free to play around with this example to get a handle on how dependencies and routing are handled in App Mesh. As each microservice returns based on a path, maybe try building a single gateway for all the routes?

How useful is App Mesh?

Right now App Mesh is clearly a minimal viable product. It has a limited set of features and is still a work in progress. That said, this limited feature set works well. Changes you make to your mesh are propagated quickly, and without the control plane running in your cluster it means you don’t have to consider its resources.

While you can set a health check, there is no way to get any insight into what the result of these checks are in App Mesh itself or set alarms on them. Similarly, there is no real way to check the propagation status of your changes to App Mesh or even if it is working, unless you look at the Envoy logging output. You can write your own tooling to get a look at the connections (or use mine), but again it’s not available out of the box.

The App Mesh roadmap contains a lot of future improvements, and it’s likely to lean hard on the integration with other AWS services. If all you need is a basic way to manage your traffic, App Mesh is a good solution. However, if you need a bit more from your mesh network, you will likely want to wait a bit or use something else.