Introducing aws-terraform-bootstrap: a starter repo to deploy a Python app on top of AWS lambda, RDS, S3, and VPC in less than 20 minutes

Table of Contents

Shane Keller
5 min readMar 16, 2018

Over the past two months, I’ve made some great progress on building an algorithmic trading system. In the process, I’ve learned how to bootstrap infrastructure on AWS using Terraform. As part of that effort, I wrote “aws-terraform-bootstrap”, a starter repo that bootstraps AWS infrastructure on top of Terraform and runs a “hello_world” Python 3 app that uses the following AWS services:

In addition, this repo helps you configure the following tools in your local environment:

  • Postgres — SQL database
  • Pyvenv — Python 3 virtual environment
  • Terraform — infrastructure as code
  • PyCharm — python IDE
  • Nosetests — automated testing framework
  • iPython on top of Anaconda — data analysis IDE

Once you’ve finished setting up the repo, you will have done the following:

  • deployed an app on AWS with Terraform
  • set up the app to run locally
  • set up your local machine to ssh into an EC2 bastion host, and connect to a RDS instance via psql.

Motivations

There are many blog posts, github repos, stack overflow posts, and AWS documentation that explain parts of how to build and deploy AWS infrastructure with Terraform, but no repo I’ve found that puts all of the concepts I wanted together and makes it quick and easy to set up an app on AWS and locally. I spent too much time reinventing the wheel and doing DevOps work. This repo automates as much of that work as possible and provides clear documentation for the rest of it.

Application architecture

The app itself is simple. hello_world_lambda.py runs the function hello_world.py, which reads a parameter from SSM Parameter Store, makes a HTTPS request to a fake online REST API, and writes part of REST API response to a either a csv file or Postgres (depending on the environment variable), hosted either locally or on AWS (depending on the environment variable). The csv file is either in a local directory:

<aws-terraform-bootstrap-dir>/data/<timestamp>_message.csv>

or an AWS bucket:

hello-world-<hello_world_bucket_name_suffix>/<timestamp>_message.csv

The Postgres database is either a local Postgres instance:

psql --dbname=hello_world --user=hellorole --host=localhost

or a Postgres instance hosted on a RDS host. Details on connecting to the RDS Postgres instance are described later in this README.

Networking architecture

The above architecture diagram shows that the app is deployed in a VPC consisting of two private subnets and two public subnets across two availability zones, one public and one private subnet per availability zone (AZ). The lambda and RDS are deployed in a VPC because RDS can only be deployed into a VPC, and so a lambda that accesses the RDS instance has to either be in the same VPC or use VPC peering. Also, deploying an instance into a VPC yields additional benefits such as the ability to change the security group of an instance while it’s running. Read more in AWS’s VPC documentationabout VPCs and AWS’s migration from their legacy EC2-Classic architecture to VPCs.

The RDS instance is shown in one subnet only because it’s a single AZ deployment. Multi-AZ deployments are higher cost, and unnecessary for a bootstrap app like this one. It’s easy to add multi-az support though if an app needs that increased uptime.

The lambda is shown in both private subnets because it can be run in either subnet. If one is available and one is down, for example, the lambda will be run in the subnet that is up. Since the lambda depends on the NAT for access to the internet, there’s one NAT in each AZ.

There’s only one bastion host because 1) that saves costs, and 2) uptime is not as important for a bastion host as it would be for the lambda. If the AZ containing the bastion host is down, it’s less than a minute to use Terraform to add a new bastion host to the other AZ.

Why not a serverless framework?

Cloud Formation is AWS’s framework for deploying serverless architectures. Like Terraform, it enables developers to write infrastructure as code. In addition, it abstracts away DevOps tasks such as lambda packaging, deployment, and monitoring. Apex is TJ Holowaychuk’s version of CF.

Serverless is another framework that offers similar functionality, as well as support for other cloud providers.

I plan on learning a serverless framework in the future, but before learning those tools, I wanted to get more lower level experience with cloud computing devops. With that lower level experience, I am better equipped to understand the components of cloud architectures, debug production issues, and understand the tradeoffs of the various severless frameworks.

Why Terraform?

Since a serverless framework is not being used, an infrastructure as code (IaC) framework is needed to provision AWS instances. “Provision” means choosing the number, type, and properties of instances, and deploying those instances. Terraform was chosen for a few reasons. It’s an open source infrastructure as code framework, which means that it’s free to use (paid features like Terraform Vault are optional). It’s used by companies with large apps that serve millions of users, including my former company. It can be used with any cloud computing platform. Finally, it’s declarative, which means the end infrastructure state is specified, and Terraform figures out how to achieve that state. That means it’s easy to add, change, and remove infrastructure. Gruntwork.io has an excellent blog post that dives deeper into the benefits of Terraform compared with Cloud Formation, Puppet, and other tools.

Next steps

This repo has a lot more that needs to be done in order to be a production-ready system that could support a product with users. I’ve outlined most of that work in the repo. As I build out more of the trading system, I’ll continue building out this repo. Stay tuned for a part 2 in which I add a “hello world” Docker app and deploy it into an ECS cluster!

The trading system and other projects have taken most of my efforts away from learning data science, but once the infrastructure of the system is more built out and I start looking for a profitable trading strategy, I’ll move back into data science. I haven’t decided on a timeline for finishing the system. I’m getting some great experience with the latest tech, and that’s enough for me to keep going without an explicit deadline for now.

--

--