AWS App Mesh — First Take

Photo by Ricardo Gomez Angel on Unsplash

Among all the services announced at AWS re:invent 2018, AWS App Mesh was one of the ones I found most interesting. What is App Mesh? From the AWS blog:

AWS App Mesh is a service mesh that allows you to easily monitor and control communications across microservices applications on AWS

This new service is said to be compatible with different platforms within AWS including EC2, ECS, and EKS. Utilizing an open source Envoy container as the proxy, App Mesh gives you a managed control plane for the service mesh. As of writing this blog post, it is still in a Public Preview and limited as to its features and regional availability.


ECS Example

In addition to a couple videos and blog posts related to App Mesh, AWS has placed some sample code in a GitHub repository for users to experiment with. I did a git clone of this repository and started my journey to see what App Mesh is capable of.

There was sample code for running App Mesh on both ECS and EKS. As mentioned in the section header, I elected to run through the ECS example. EKS required some additional setup steps and based on the current technology stack at work, ECS seemed a better starting spot. The ECS examples run on standard ECS, hosted on EC2 instances. Fargate support is not available yet. Per an issue logged on Github:

@ranga543 App Mesh does not support AWS Fargate yet. Running the aws-appmesh-proxy-route-manager requires NET_ADMIN which is not availble in Fargate. We are working with the Fargate team to solve this problem as soon as possible.

Preparation Work

The sample application that will be running on ECS requires a couple Docker images built and placed in an ECR repository that your ECS cluster has access to. The Envoy and proxy-init images are pulled from a public ECR source. Therefore, I had to create a couple repositories in ECR to place the Docker images in.

ECR Repositories Created

The sample code in the GitHub repository has several bash scripts to be run that standup CloudFormation stacks. They expect certain environment variables to be set. I created an export.sh script that I sourced with the necessary variable filled in.

export AWS_PROFILE=defaultexport AWS_REGION=us-west-2export AWS_DEFAULT_REGION="$AWS_REGION"export ENVIRONMENT_NAME=chris-env-name-meshexport MESH_NAME=chris-mesh-nameexport KEY_PAIR_NAME=personal_bastionexport ENVOY_IMAGE="111345817488.dkr.ecr.us-west-2.amazonaws.com/aws-appmesh-envoy:v1.8.0.2-beta"export CLUSTER_SIZE=3export SERVICES_DOMAIN="default.svc.cluster.local"export COLOR_GATEWAY_IMAGE="235465272069.dkr.ecr.us-west-2.amazonaws.com/color_gateway"export COLOR_TELLER_IMAGE="235465272069.dkr.ecr.us-west-2.amazonaws.com/color_teller

Building the Infrastructure

With the ECR repositories created and the environment variables sourced, it was time to begin standing up the infrastructure. Following along from the instructions in the README was straight forward.

ubuntu@ip-172-31-23-74:~/aws-app-mesh-examples/examples$ ./infrastructure/vpc.sh create-stack
+++ dirname ./infrastructure/vpc.sh
++ cd ./infrastructure
++ pwd
+ DIR=/home/ubuntu/aws-app-mesh-examples/examples/infrastructure
+ aws --profile default --region us-west-2 cloudformation deploy --stack-name chris-env-name-mesh-vpc --capabilities CAPABILITY_IAM --template-file /home/ubuntu/aws-app-mesh-examples/examples/infrastructure/vpc.yaml --parameter-overrides EnvironmentName=chris-env-name-mesh
Waiting for changeset to be created..
Waiting for stack create/update to complete
Successfully created/updated stack - chris-env-name-mesh-vpc

The VPC created successfully. It’s configuration was fairly vanilla including a pair of public subnets and private subnets distributed across two Availability Zones in the region us-west-2.

VPC Parameters

The next piece of infrastructure to be build was the service mesh itself. Unlike, the VPC and the ECS cluster definitions, this would not be built using CloudFormation. Instead a simple bash script wrapped the AWS CLI command to create the mesh.

ubuntu@ip-172-31-23-74:~/aws-app-mesh-examples/examples$ ./infrastructure/mesh.sh create-mesh
+ '[' '!' -z default ']'
+ PROFILE_OPT='--profile default'
+ aws --profile default appmesh create-mesh --mesh-name chris-mesh-name
{
"mesh": {
"status": {
"status": "ACTIVE"
},
"meshName": "chris-mesh-name",
"metadata": {
"version": 1,
"lastUpdatedAt": 1546101269.215,
"createdAt": 1546101269.215,
"arn": "arn:aws:appmesh:us-west-2:235465272069:mesh/chris-mesh-name",
"uid": "7642a155-1d20-4793-8b45-e3ac9a4ef79c"
}

Finally, I created the ECS cluster that the colorapp would be running on top of.

ubuntu@ip-172-31-23-74:~/aws-app-mesh-examples/examples$ ./infrastructure/ecs-cluster.sh create-stack
+++ dirname ./infrastructure/ecs-cluster.sh
++ cd ./infrastructure
++ pwd
+ DIR=/home/ubuntu/aws-app-mesh-examples/examples/infrastructure
+ aws --profile default --region us-west-2 cloudformation deploy --stack-name chris-env-name-mesh-ecs-cluster --capabilities CAPABILITY_IAM --template-file /home/ubuntu/aws-app-mesh-examples/examples/infrastructure/ecs-cluster.yaml --parameter-overrides EnvironmentName=chris-env-name-mesh KeyName=personal_bastion ECSServicesDomain=default.svc.cluster.local ClusterSize=3
Waiting for changeset to be created..
Waiting for stack create/update to complete
Successfully created/updated stack - chris-env-name-mesh-ecs-cluster

The ECS cluster was created successfully. I used a cluster of three EC2 hosts and took the default c4.large instance type which was overkill for this example.

Creating the Application

A simple Go application is provided for users in the apps/ directory called colorapp/.

At the microservice level the application looks like this:

Microservice view of colorapp

The application is composed of five containers:

  1. proxy-init — A side car container that is suppose to manage the router configuration. (Failed to come up for me but didn’t seem to hurt application.)
  2. Envoy — The sidecar proxy that gates all web traffic to the application container.
  3. colorteller — A HTTP Go service that returns a color based on an environment variable.
  4. colorgateway — A HTTP Go application that queries the colorteller service, tallies the result, and returns a JSON object with a color and running count.
  5. TesterService — A simple service that just queries colorgateay and records the result in a log file. It uses the expected service name, colorgateway.default.svc.cluster.local, to test the service discovery functionality with the mesh.

After switching into the colorapp/ directory, I ran the service mesh deploy script to configure the virtual nodes, virtual routers, and routes within those routers for the initial mesh configuration. Again, this was not done via CloudFormation, but took JSON configuration files as input that were passed to the AWS CLI.

ubuntu@ip-172-31-23-74:~/aws-app-mesh-examples/examples$ cd apps/colorapp/ ubuntu@ip-172-31-23-74:~/aws-app-mesh-examples/examples/apps/colorapp$ ./servicemesh/deploy.sh [MESH] [Sat Dec 29 17:09:46 UTC 2018] : Creating virtual nodes[MESH] [Sat Dec 29 17:09:46 UTC 2018] : ======================[MESH] [Sat Dec 29 17:09:46 UTC 2018] : cli_input={    "spec": {        "listeners": [            {                "portMapping": {                    "port": 9080,                    "protocol": "http"
...
MESH] [Sat Dec 29 17:09:50 UTC 2018] : aws appmesh update-route --profile default --mesh-name chris-mesh-name --cli-input-json {    "routeName": "colorteller-route",    "spec": {        "httpRoute": {            "action": {                "weightedTargets": [                    {                        "virtualNode": "colorteller-vn",                        "weight": 1                    }                ]            },            "match": {                "prefix": "/"            }        }    },    "virtualRouterName": "colorteller-vr"} --query route.metadata.uid --output text[MESH] [Sat Dec 29 17:09:50 UTC 2018] : --> b2012f52-90de-4fce-9c5f-32c2aba47c5c

The final piece of setting up application was getting it deployed to ECS. This was straightforward. The sample code deployed without incident including six unique task definitions and associated services:

  1. Colorgateway
  2. Colorteller (returns white as default)
  3. Colorteller-black
  4. Colorteller-red
  5. Colorteller-blue
  6. TesterService
ubuntu@ip-172-31-23-74:~/aws-app-mesh-examples/examples/apps/colorapp$ ./ecs/ecs-colorapp.sh
+++ dirname ./ecs/ecs-colorapp.sh
++ cd ./ecs
++ pwd
+ DIR=/home/ubuntu/aws-app-mesh-examples/examples/apps/colorapp/ecs
+ aws --profile default --region us-west-2 cloudformation deploy --stack-name chris-env-name-mesh-ecs-colorapp --capabilities CAPABILITY_IAM --template-file /home/ubuntu/aws-app-mesh-examples/examples/apps/colorapp/ecs/ecs-colorapp.yaml --parameter-overrides EnvironmentName=chris-env-name-mesh EnvoyImage=111345817488.dkr.ecr.us-west-2.amazonaws.com/aws-appmesh-envoy:v1.8.0.2-beta ECSServicesDomain=default.svc.cluster.local AppMeshMeshName=chris-mesh-name ColorGatewayImage=235465272069.dkr.ecr.us-west-2.amazonaws.com/color_gateway ColorTellerImage=235465272069.dkr.ecr.us-west-2.amazonaws.com/color_teller
Waiting for changeset to be created..
Waiting for stack create/update to complete
Successfully created/updated stack - chris-env-name-mesh-ecs-colorapp

We can see all the services come up healthy after running the deployment script.

ECS Console Service Status

Reviewing the application

With everything up and running correctly, it was now time to observe the application and try to understand how things were coming together. There are a lot of services that are being built during the Public Preview. Integrations such as X-trace, DataDog, and CloudWatch are being baked. However, basic ECS task definition logging to CloudWatch and associated metrics gave me a pretty clear picture on what was happening. One thing I immediately observed in the logs for the ServiceTester is that only the color ‘white’ was being returned. This is because all traffic from the colorgateway service is being routed to the default colorgateway service and it has its environment variable for color set to ‘white’. From a logical service mesh routing standpoint the application is initially configured to look like this:

Colorapp Mesh Initial Configuration

The above works as expected based on the initial configuration. However, it isn’t very interesting and doesn’t really exercise the capability of the service mesh. Several other scripts are included that can help here. They update the routes in the colorteller Virtual Router so that it does a different traffic distribution. I ran the route_canary.sh script and it applied a 80/20, blue/red distribution. The updated mesh looked like this:

Colorapp Mesh blue/red Canary Route Update
ubuntu@ip-172-31-23-74:~/aws-app-mesh-examples/examples/apps/colorapp$ ./servicemesh/route_canary.sh
[MESH] [Sat Dec 29 17:19:40 UTC 2018] : Using route in file colorteller-route-blue-80-red-20.json
[MESH] [Sat Dec 29 17:19:40 UTC 2018] : aws appmesh update-route --mesh-name chris-mesh-name --profile default --cli-input-json file:////home/ubuntu/aws-app-mesh-examples/examples/apps/colorapp/servicemesh/config/update_routes//colorteller-route-blue-80-red-20.json --query route.metadata.uid --output text
[MESH] [Sat Dec 29 17:19:40 UTC 2018] : --> b2012f52-90de-4fce-9c5f-32c2aba47c5c
[MESH] [Sat Dec 29 17:29:40 UTC 2018] : Using route in file colorteller-route-blue-80-red-20.json
[MESH] [Sat Dec 29 17:29:40 UTC 2018] : aws appmesh update-route --mesh-name chris-mesh-name --profile default --cli-input-json file:////home/ubuntu/aws-app-mesh-examples/examples/apps/colorapp/servicemesh/config/update_routes//colorteller-route-blue-80-red-20.json --query route.metadata.uid --output text
[MESH] [Sat Dec 29 17:29:41 UTC 2018] : --> b2012f52-90de-4fce-9c5f-32c2aba47c5c

After applying the above, I went into CloudWatch to check the logs and verify I was now only getting ‘red’ and ‘blue’ returned within the TesterService. Looks like it applied successfully!

ServiceTester logs showing red and blue

Conclusion

App Mesh is a big deal for AWS. While I didn’t try it for EKS, it seems to straddle the fence between supporting their native container orchestration engine, ECS, and Kubernetes through EKS. Even after exercising the Public Preview for a short time, I think this is something I could see building into our future system architectures. Having experienced the pain of trying to get common instrumentation, logging, debugging, and RBAC into polyglot services, building those capabilities once into a service mesh seems like a win. In an ideal world, software developers should not have to worry about reimplementing those capabilities with each microservice they spin up. Service mesh takes us a step toward that ideal.

Based on the issues mentioned in GitHub and other conversations around App Mesh it seems like this project is heading in the right direction. Here is my early wish list:

  1. Fargate support
  2. Ubiquitous region support
  3. First class support in Cloudformation (Gluing together CloudFormation with AWS CLI wrapped Bash seemed clunky.)
  4. Integration with AWS IAM for security
  5. All of the visibility tooling at the mesh level; above and beyond what is already there for any vanilla ECS service.