Autoscaling ECS cluster of EC2 instances with Terraform and GitLab Pipelines

Manage AWS infrastructure as a code. Deploy with GitLab.

Pawel Dudzinski
Nov 4 · 10 min read

In many companies a cloud infrastructure that runs their software, is a kind of a mystery that only DevOps know how to cope with. Obviously, there are wiki documentations with diagrams, flow charts, use cases etc. In a dynamic environment of software development these documents are constantly improved and need to be kept up to date. It means another bunch of stuff to maintain.

Terraform is a tool that makes it clearer. An infrastructure as a code. If well written, logically divided into files or modules and backed up with a simple diagram, it’s like a documentation of your cloud architecture. It allows you to build it, change it, even version it!

It is perfect for designing a new architecture. You have everything under control from the very beginning. But what if you deal with legacy, long running, complicated architecture that even the oldest DevOps in your company don’t know thoroughly. If your stuff runs on AWS, Terraforming is a great tool that is able to export existing architecture (AWS only) to Terraform scripts. Although, the tool is not complete (e.g. so far it lacks ECS support) it’s really handy.

Below you can find a simple example of exporting VPC definitions:

Console output for `terraforming vpc` command

I have two VPCs in my AWS account, so I got two Terraform definitions with all the parameters that are crucial to describe ones.

Finishing this short digression, let’s focus on the use case that I want to describe and then fulfil. Here’s what’s to be done:

  • Let’s assume that some external application is able to publish messages with sports results to any AWS SQS queue.
  • I want to have a system to process those sports results.
  • In my infrastructure I need to have a bunch of workers that will consume messages from AWS SQS, process them and save the result to RDS MySQL.
  • Let’s assume that the worker is a dockerized python application and we want use ECS to manage our containers with ECR as a Docker containers registry.
  • Deployment is going to be done using GitLab CI Pipelines.

Here’s a simple data flow of what I want to achieve:

Everything except the “SQS publisher” will live in my AWS cloud.

The infrastructure for this system could look like that:

I’d need to provision a bunch of building blocks:

  • VPC as an isolated pool for my resources
  • Two public subnets within my VPC (high availability MySQL needs at least two subnets in different availability zones)
  • Internet Gateway to contact the outer world
  • Security groups for RDS MySQL and for EC2s
  • Auto-scaling group for ECS cluster with launch configuration
  • Hight availability (Multi AZ) RDS MySQL instance
  • SQS
  • ECR container registry
  • ECS cluster with task definition and service definition

The Terraform Part

To start with Terraform we need to install it. Just go along with the steps in this document: https://www.terraform.io/downloads.html

Verify the installation by typing:

With Terraform (version 0.12.0) we can provision cloud architecture by writing code which is usually created in a programming language. In this case it’s going to be HCL — a HashiCorp configuration language.

Terraform state

Before writing the first line of our code lets focus on understanding what is Terraform state.

The state is a kind of a snapshot of the architecture. Terraform needs to know what was provisioned, what are the resources that were created, tract the changes etc.

All that information is written either to a local file terraform.state or to a remote destination. Generally the code is shared between members of a team, therefore keeping local state file is never a good idea. Single source of truth is only ensured by keeping the state in a remote destination. When working with AWS, this destination is s3.

This is the first thing that we need to code —tell terraform that the state location will be remote and kept is s3 (terraform.tf):

Terraform will keep the state in a s3terraform bucket under a state.tfstate key. In order that to happen we need to set up three environment variables:

These credentials can be found/created in AWS IAM Management Console in “My security credentials” section. Both access keys and region must be stored in environment variables if we want to keep the remote state.

Virtual Private Cloud

Terraform needs to know with which API should interact. Here we say it’ll be AWS. List of available providers can be found here: https://www.terraform.io/docs/providers/index.html

The provider section has no parameters because we’ve already provided the credentials needed to communicate with AWS API as environment variables in order have remote Terraform state (there is possibility to set it up as a provider parameters, though).

This resource block of a given type aws_vpc with a given name vpc creates Virtual Private Cloud — a logically isolated virtual network. When creating VPC we must provide a range of IPv4 addresses. It’s the primary CIDR block for the VPC and this is the only required parameter.

Parameters enable_dns_support and enable_dns_hostnames are required if we want to provision database in our VPC that will be publicly accessible (and we do).

Internet gateway

In order to allow communication between instances in our VPC and the internet we need to create Internet gateway.

The only required parameter is a previously created VPC id that can be obtain by invoking aws_vpc.vpc.id this is a terraform way to get to the resource details: resource.resource_name.resource_parameter.

Subnets

Within the VPC let’s add two public subnets in two availability zones:

To create subnet we need to provide VPC id and CIDR block. Additionally we can specify availability zone, but it’s not required.

To specify availability zone I used data block. Data block requests a given resource (aws_availability_zones) using AWS api and returns the result:

Here we are requesting all available availability zones for region specified in environment variables.

Route Table

Route table allows to set up rules that determine where network traffic from our subnets is directed. Let’s create new, custom one, just to show how it can be used and associated with subnets.

What we did is created a route table for our VPC that directs all the traffic (0.0.0.0/0) to the internet gateway and associate this route table with both subnets. Each subnet in VPC have to be associated with a route table.

Network Access Control List (ACL)

ACL is an optional layer that controls traffic from (egress) and to (ingress) the VPC subnet(s).

Required parameters to create ACL is VPC that it concerns and. Ingress and egress blocks are optional but let’s define them just to show how it works.

Inbound and outbound traffic to and from VPC is allowed from/to the internet (0.0.0.0/0) using all possible ports (0–65535) — it’s not very secure way. You can narrow down port list to the ones that your instances will use and IP ranges, for instance, to your VPN by creating more rules.

Security Groups

Security groups works like a firewalls for the instances, where ACL works like a global firewall for the VPC. Because we allow all the traffic from the internet to and from the VPC we might set some rules to secure the instances themselves.

We will have two instances in our VPC — cluster of EC2s and RDS MySQL, therefore we need to create two security groups.

First security group is for the EC2 that will live in ECS cluster. Inbound traffic is narrowed to two ports: 22 for ssh and 443 for HTTPS needed to download the docker image of the application from EKR.

IP range is also set for imagined VPN and the connection via those two ports can only be establish from this IP range.

Second security group is for the RDS that opens just one port, the default port for MySQL (3306) also narrowing down IP ranges to imagined VPN. Inbound traffic is also allowed from ECS security group, which means that the application that will live on EC2 in the cluster will have permission to use MySQL.

This ends setting up the networking park of our architecture. Now it’s time for autoscaling group for a EC2 instances in ECS cluster.


Autoscaling Group

Autoscaling group is a collection of EC2 instances. The number of those instances is determined by scaling policies. We will create autoscaling group using a launch template.

Before we will launch container instances and register them into a cluster, we have to create an IAM role for those instances to use when they are launched:

Having IAM role we can create an autoscaling group from template:

I used special kind of AMI (ami-094d4d00fd7462815) which is a special ECS-optimized image with preinstalled Docker. EC2 t2.micro instances will be launched within given security group.

If we want to use created, named ECS cluster we have to put that information into user_data, otherwise our instances will be launched in default cluster.

Basic scaling information is described by aws_autoscaling_group parameters. Autoscaling policy has to be provided, we will do it later.

Having autoscaling group set up we are ready to launch our instances and database.

Database Instance

Having prepared subnets and security group for RDS we need one more thing to cover before launching database instance. To provision database we need to follow some rules:

  • Our VPC has to have at least to security groups.
  • Our VPC has to have enabled DNS hostnames and DNS resolution (we did that while creating VPC).
  • Our VPC has to have a DB subnet group (that is about to happen).
  • Our VPC has to have a security group that allows access to the DB instance.

Let’s create the missing piece:

And database the instance itself:

All the parameters are more less self explanatory. If we want out database to be publicly accessible you have to set publicly_accessible parameter astrue.

Simple queue system

SQS is needed to “insert” the traffic to our ECS cluster. We assumed that workers that will be installed on EC2 instances will consume messages from the queue.

Last part of our architecture puzzle is the ECS cluster/service/task and ECR images repository.

Elastic Container Service

ECS is a scalable container orchestration service that allows to run and scale dockerized applications on AWS.

To launch such application we need to download image from some repository. For that we will use ECR. We can push images there and use them while launching EC2 instances within our cluster:

And the ECS itself:

Cluster name is important here, as we used it previously while defining launch configuration. This is where newly created EC2 instances will live.

To launch a dockerized application we need to create a task — a set of simple instructions understood by ECS cluster. The task is a JSON definition that can be kept in a separate file:

That comes with a terraform template_file definition. This data resource will render given template from a given file or string:

In a JSON file we define what image will be used using template variable provided in a template_file data resource as repository_url tagged with latest. 512 MB of RAM and 2 CPU units that is enough to run the application on EC2.

Having this prepared we can create terraform resource for the task definition:

The family parameter is required and it represents the unique name of our task definition.

The last thing that will bind the cluster with the task is a ECS service. The service will guarantee that we always have some number of tasks running all the time:

This ends the terraform description of an architecture.

There’s just one more thing left to code. We need to output the provisioned components in order to use them in worker application.

We need to know URLs for:

  • ECR repository
  • SQS queue
  • MySQL host

Terraform provides output block for that. We can print to the console any parameter of any provisioned component.

Applying the changes

First we need to initialize a working directory that contains Terraform files by typing terraform init. This command will install needed plugins and provide a code validation.

If everything is fine we can run terraform apply to actually start provisioning the desired architecture. Terraform will output the execution plan which is fairly readable and worth of checking out before confirming applying process.

After applying all the changes to AWS, Terraform will provide output with a bunch of URLs that we will use in the next steps.

Terraform output

Python worker application

Let’s assume that SQS queue is fed by a different application that we are not aware of. Only thing we know is the message format that might look like that:

Example SQS message (JSON)

A simple python script:

Connection credentials, like it or not, are in the environment variables. This will enable us to use GitLab pipelines to deploy worker to ECS cluster.

Worker is pretty simple. It establishes connection with SQS queue, reads messages in an infinite loop and saves it to MySQL using mysterious external database connector.

Worker Dockerfile

ECS manages dockerized applications, so let’s create very simple Dockerfile for ours:

Worker Dockerfile

Having dockerized worker, push the code and the Dockerfileto your GitLab repository so we can try to deploy it to ECS cluster using pipelines.

GitLab pipeline

Here’s all you need to know about GitLab pipelines: https://docs.gitlab.com/ee/ci/pipelines.html

To enable pipelines you’d have to create .gitlab-ci.yml file in your project root directory. From there GitLab will know that the pipeline configuration it’s there.

First put all of the environment variables into GitLab CI/CD settings:

  • AWS_ACCESS_KEY_ID
  • AWS_SECRET_ACCESS_KEY
  • AWS_DEFAULT_REGION
  • SQL_URL

Then the script can take advantage of those variables:

This will build the image, push it to ECR and use it to deploy worker to ECS cluster. 💥

Pawel Dudzinski

Written by

Software Engineer at Sauce Labs 👷 | Triathlete 🚴 | Portuguese language enthusiast 🇵🇹

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade