Serverless Data Engineering Platform on Cloud

Deploy Apache Airflow on AWS ECS (using FARGATE). Part 2

Fartash Haghani
5 min readMay 16, 2018

In Part 1 I’ve explained every step required to deploy Apache Airflow with celery executor on AWS ECS using EC2 (Highly recommend to read Part1 first).

In this post, I explain how to deploy Airflow on Serverless environment.

You can get all codes from https://github.com/fartashh/Airflow-on-ECS/tree/fargate

What is & Why AWS Fargate?

AWS Fargate is a technology that you can use with Amazon ECS to run containers without having to manage servers or clusters of EC2 instances. With AWS Fargate, you no longer have to provision, configure, and scale clusters of virtual machines to run containers. This removes the need to choose server types, decide when to scale your clusters, or optimize cluster packing.

When you run your tasks and services with the Fargate launch type, you package your application in containers, specify the CPU and memory requirements, define networking and IAM policies, and launch the application.

How to Deploy Airflow on ECS (using Fargate)

Preparation Step;

Follow from preparation step to step 9 in part 1.

Step 9: Create the Task Execution IAM Role

Amazon ECS needs permissions so that your Fargate task will be able to store logs in CloudWatch. This permission is covered by the task execution IAM role.

  1. Create a file named execution-assume-role.json with the following contents:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Service": "ecs-tasks.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}

2. Using the AWS CLI, create the task execution role:

>>> aws iam --region us-east-1 create-role --role-name ecsExecutionRole --assume-role-policy-document file://execution-assume-role.json

3. Using the AWS CLI, attach the task execution role policy:

>>> aws iam --region us-east-1 attach-role-policy --role-name ecsExecutionRole --policy-arn arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy

Step 10: Configure the ECS CLI

The ECS CLI requires credentials in order to make API requests on your behalf. It can pull credentials from environment variables, an AWS profile, or an Amazon ECS profile.

  1. Create a cluster configuration, which defines the AWS region to use, resource creation prefixes, and the cluster name to use with the Amazon ECS CLI:
>>> ecs-cli configure --cluster datalab-fargate --region us-east-1 --default-launch-type FARGATE --config-name datalab-fargate

2. Create a CLI profile using your access key and secret key:

ecs-cli configure profile --access-key AWS_ACCESS_KEY_ID --secret-key AWS_SECRET_ACCESS_KEY --profile-name datalab-fargate

Tip: if you have more than on profile and cluster update ~/.ecs/confg and ~/.ecs/credentials and set cluster and profile name as a default value.

~/.ecs/config

Step 11: Create a Cluster

Create an Amazon ECS cluster with the ecs-cli up command. Since you specified Fargate as your default launch type in the cluster configuration, this command will create an empty cluster.

Tip: ecs-cli up command is able to configured VPC with two public subnets. but I suggest to understand AWS network and be you own AWS BOSS.

>>> ecs-cli up --vpc vpc-c7aeebbc --subnets subnet-a4xxb0f9 --security-group sg-e9bxx9a1

Step 12: Update Compose file

If we use the compose file we used for ECS(EC2) we will get following errors;

  • ClientException: When networkMode=awsvpc, the host ports and container ports in port mappings must match.
  • ClientException: Links are not supported when networkMode=awsvpc.

In order to use FARGATE we need to set AWS ECS network mode awsvpc which dictate few restriction in terms of parameters we can use in docker compose file.

  • The host ports and container ports mapping must match
  • links should remove form compose file
docer-compose for using Fargate

Step 13: ECS specific parameters

In addition to the Docker compose information, there are some Amazon ECS specific parameters you need to specify for the service. Using the VPC, subnet, and security group IDs. Create a file named ecs-params.yml with the following content:

ecs-params.yml

Step 14: Deploy the Compose File to a Cluster

After you create the compose file, you can deploy it to your cluster with ecs-cli compose service up.

>>> ecs-cli compose --project-name datalab-fargate service up --create-log-groups

Step 15: View the Running Containers on a Cluster

After you deploy the compose file, you can view the containers that are running in the service with ecs-cli compose service ps.

>>> ecs-cli compose --project-name datalab-fargate service ps

output

Nice, Airflow and flower are accessible on port 8080 and 5555 respectively.

Step 16: View the Container Logs

View the logs for the task:

>>> ecs-cli logs --task-id 5cba813b-35f0-45b4-88ed-a927c039a9aa --follow

Step 17: Scale the Tasks on the Cluster

You can scale up your task count to increase the number of instances of your application with ecs-cli compose service scale.

>>> ecs-cli compose --project-name datalab-fargate service scale 2

Step 18: Update Docker image

After updating and testing your Airflow dags and logic you need to follow the following steps;

  1. Rebuild image
  2. Tag image
    If you changed the tag name make sure you update the docker-compose file as well.
  3. Push image to ECR repository
  4. Update service
>>> aws ecs update-service --cluster datalab-fargate --service datalab-fargate --force-new-deployment

FARGATE vs EC2

Fargate is definitely more expensive than using EC2. There is no additional charge for EC2 launch type. You pay for AWS resources (e.g. EC2 instances or EBS volumes) you create to store and run your application. You only pay for what you use, as you use it; there are no minimum fees and no upfront commitments. But with Fargate, you pay for the amount of vCPU and memory resources that your containerized application requests. vCPU and memory resources are calculated from the time your container images are pulled until the Amazon ECS Task* terminates, rounded up to the nearest second. A minimum charge of 1 minute applies.

For instance in order to our configuration for Airflow we need to spin up EC2 t2.large to avoid restarting services. A t2.large instance cost us $ 67.93 per month. Using same amount of resources (2 vCPU & 8 Memory) will cost us about 72.89 $ for vCPU and about 73.20$ for memory. The total cost of using Fargate with same configuration of t2.large instance will be 147.09$ which is more than two times of EC2 cost.

On the other hand, Scaling Fargate is much easier than using EC2.

--

--