A “Less Server” Data Infrastructure Solution for Ingestion and Transformation Pipelines — Part 3

Michael Triska
AMARO
Published in
4 min readJan 15, 2020

Written by Michael Triska — Data Engineer at AMARO

AWS ECS Fargate Tasks Networking with Snowflake.

© Michael Triska

What stands out in our new architecture design is that AWS ECS Fargate provides on-demand flexibility and gets rid of the burden of resource and container management. However, you may still need to create additional resources on top of a Fargate Launch Types e.g. for security and networking management.

For security reasons, AMARO’s data team created network policies on Snowflake to allow or deny access to a list of IPv4 addresses. As IP addresses from an AWS Fargate task request by default are assigned randomly from an AWS IP address range (read more here), we had to create a virtual network with AWS VPC deploying Fargate tasks networking with Snowflake. This blog post will discuss a NAT solution to send requests from a Fargate task to Snowflake (or any other whitelisted database) and will also show an AWS Cloudformation Template for the presented solution.

External Networking with Snowflake

In this section, we present the high-level NAT solution to whitelist the requests initiated by Fargate Tasks on Snowflake. Image 1 illustrates how the components in this process interact with each other. In our case we use a solution of VPC and EC2 resources for external networking and communication with Snowflake through the following:

  1. We set up a VPC and make requests to the internet through an Internet Gateway.
  2. As we handle sensitive information, we launched the Fargate task in a private subnet to prevent the task from receiving inbound traffic from the Internet. Instances in this subnet have no direct internet access, and only have private IP addresses that are internal to the VPC, not directly accessible by the public.
  3. With this virtual network setup we created a NAT Gateway as a networking bridge in the public subnet which assigns an elastic IP to all traffic from the public subnet to the Internet Gateway; allowing routed traffic from the Fargate Task in the private subnet to access the internet with a fix IP address, while not allowing inbound connections.
Image 1: Private and Public Networks Architecture Pattern. Peck (2018) https://aws.amazon.com/blogs/compute/task-networking-in-aws-fargate/

Saving Costs with a NAT Instance

As you yet can not assign a static IPv4 address to a Fargate task, the solution shown doesn’t feel to be a best practice as the NAT Gateway runs for a very long amount of time without doing anything. Luk van den Borne (2019) shows a great way to save costs with NAT instances. Roughly summarizing:

A NAT Gateway costs approximately $35 US per month. It would be much cheaper to use a NAT Instance (e.g. EC2 t3.nano instance with 5 Gbit/s of bandwidth would cost $3.45 US per month) which routes the traffic to the internet. However, it could make sense if you check this AWS comparison between VPC and NAT to understand what fits your requirements and skills best.

© Michael Triska

Stop Setting Up Resources in the Console

Infrastructure as code gives you a unified and maintainable template for how to deploy your architecture. Here you’ll find a collection of AWS CloudFormation templates for launching containers in Fargate with a variety of different networking approaches.

The CloudFormation template below shows the solution of image 1. You will only need to change the “ECRRepositoryName” Parameter in the top section of the file for your needs and upload your Docker image that you want to execute with a Fargate Launch Type.

Next to the AWS EC2 networking resources, the template also defines AWS ECS, ECR, Cloudwatch and IAM resources that are necessary or helpful to run Fargate tasks as shown in image 2.

Image 2: CloudFormation Stack Resources.
CloudFromation Template: Private and Public Networks Architecture for ECS Fargate Launchtype.

We modified this template for our needs to create the networking solution described above.

--

--

Michael Triska
AMARO
Editor for

Machine Learning Architect at AMARO. German 🇩🇪 based in São Paulo 🇧🇷. Information Science Master at Humboldt-Universität zu Berlin. Get Dirty with the Data.