Serverless Data Engineering Platform on Cloud
Deploy Apache Airflow on AWS ECS (using FARGATE). Part 2
In Part 1 I’ve explained every step required to deploy Apache Airflow with celery executor on AWS ECS using EC2 (Highly recommend to read Part1 first).
In this post, I explain how to deploy Airflow on Serverless environment.
You can get all codes from https://github.com/fartashh/Airflow-on-ECS/tree/fargate
What is & Why AWS Fargate?
AWS Fargate is a technology that you can use with Amazon ECS to run containers without having to manage servers or clusters of EC2 instances. With AWS Fargate, you no longer have to provision, configure, and scale clusters of virtual machines to run containers. This removes the need to choose server types, decide when to scale your clusters, or optimize cluster packing.
When you run your tasks and services with the Fargate launch type, you package your application in containers, specify the CPU and memory requirements, define networking and IAM policies, and launch the application.
How to Deploy Airflow on ECS (using Fargate)
Preparation Step;
Follow from preparation step to step 9 in part 1.
Step 9: Create the Task Execution IAM Role
Amazon ECS needs permissions so that your Fargate task will be able to store logs in CloudWatch. This permission is covered by the task execution IAM role.
- Create a file named
execution-assume-role.json
with the following contents:
{
"Version": "2012-10-17",
"Statement": [
{
"Sid": "",
"Effect": "Allow",
"Principal": {
"Service": "ecs-tasks.amazonaws.com"
},
"Action": "sts:AssumeRole"
}
]
}
2. Using the AWS CLI, create the task execution role:
>>> aws iam --region us-east-1 create-role --role-name ecsExecutionRole --assume-role-policy-document file://execution-assume-role.json
3. Using the AWS CLI, attach the task execution role policy:
>>> aws iam --region us-east-1 attach-role-policy --role-name ecsExecutionRole --policy-arn arn:aws:iam::aws:policy/service-role/AmazonECSTaskExecutionRolePolicy
Step 10: Configure the ECS CLI
The ECS CLI requires credentials in order to make API requests on your behalf. It can pull credentials from environment variables, an AWS profile, or an Amazon ECS profile.
- Create a cluster configuration, which defines the AWS region to use, resource creation prefixes, and the cluster name to use with the Amazon ECS CLI:
>>> ecs-cli configure --cluster datalab-fargate --region us-east-1 --default-launch-type FARGATE --config-name datalab-fargate
2. Create a CLI profile using your access key and secret key:
ecs-cli configure profile --access-key AWS_ACCESS_KEY_ID --secret-key AWS_SECRET_ACCESS_KEY --profile-name datalab-fargate
Tip: if you have more than on profile and cluster update ~/.ecs/confg
and ~/.ecs/credentials
and set cluster and profile name as a default value.
Step 11: Create a Cluster
Create an Amazon ECS cluster with the ecs-cli up command. Since you specified Fargate as your default launch type in the cluster configuration, this command will create an empty cluster.
Tip: ecs-cli up command is able to configured VPC with two public subnets. but I suggest to understand AWS network and be you own AWS BOSS.
>>> ecs-cli up --vpc vpc-c7aeebbc --subnets subnet-a4xxb0f9 --security-group sg-e9bxx9a1
Step 12: Update Compose file
If we use the compose file we used for ECS(EC2) we will get following errors;
- ClientException: When networkMode=awsvpc, the host ports and container ports in port mappings must match.
- ClientException: Links are not supported when networkMode=awsvpc.
In order to use FARGATE we need to set AWS ECS network mode awsvpc
which dictate few restriction in terms of parameters we can use in docker compose file.
- The host ports and container ports mapping must match
- links should remove form compose file
Step 13: ECS specific parameters
In addition to the Docker compose information, there are some Amazon ECS specific parameters you need to specify for the service. Using the VPC, subnet, and security group IDs. Create a file named ecs-params.yml
with the following content:
Step 14: Deploy the Compose File to a Cluster
After you create the compose file, you can deploy it to your cluster with ecs-cli compose service up.
>>> ecs-cli compose --project-name datalab-fargate service up --create-log-groups
Step 15: View the Running Containers on a Cluster
After you deploy the compose file, you can view the containers that are running in the service with ecs-cli compose service ps.
>>> ecs-cli compose --project-name datalab-fargate service ps
output
Nice, Airflow and flower are accessible on port 8080 and 5555 respectively.
Step 16: View the Container Logs
View the logs for the task:
>>> ecs-cli logs --task-id 5cba813b-35f0-45b4-88ed-a927c039a9aa --follow
Step 17: Scale the Tasks on the Cluster
You can scale up your task count to increase the number of instances of your application with ecs-cli compose service scale.
>>> ecs-cli compose --project-name datalab-fargate service scale 2
Step 18: Update Docker image
After updating and testing your Airflow dags and logic you need to follow the following steps;
- Rebuild image
- Tag image
If you changed the tag name make sure you update the docker-compose file as well. - Push image to ECR repository
- Update service
>>> aws ecs update-service --cluster datalab-fargate --service datalab-fargate --force-new-deployment
FARGATE vs EC2
Fargate is definitely more expensive than using EC2. There is no additional charge for EC2 launch type. You pay for AWS resources (e.g. EC2 instances or EBS volumes) you create to store and run your application. You only pay for what you use, as you use it; there are no minimum fees and no upfront commitments. But with Fargate, you pay for the amount of vCPU and memory resources that your containerized application requests. vCPU and memory resources are calculated from the time your container images are pulled until the Amazon ECS Task* terminates, rounded up to the nearest second. A minimum charge of 1 minute applies.
For instance in order to our configuration for Airflow we need to spin up EC2 t2.large
to avoid restarting services. A t2.large
instance cost us $ 67.93 per month. Using same amount of resources (2 vCPU & 8 Memory) will cost us about 72.89 $ for vCPU and about 73.20$ for memory. The total cost of using Fargate with same configuration of t2.large
instance will be 147.09$ which is more than two times of EC2 cost.
On the other hand, Scaling Fargate is much easier than using EC2.