Building private clouds with Amazon VPC

When we started building the Karma platform we decided to build it on top of Amazon Web Services (AWS). The great thing about AWS is that it gives us a lot of flexibility to learn about what kind of infrastructure we need to grow. One of the most defining choices we made was not to build on top of a standard Amazon EC2 set up, but instead move straight to Amazon Virtual Private Cloud (VPC).

In fact, Karma is actually of a series of applications that all talk to each other, a service oriented infrastructure. We need to have more influence on the network layer to support that. With VPC you can create isolated sets of server instances, and get a lot more control over the networking environment.

Now I am sure you are thinking; why would you want to deal with all this hassle? After all, Amazon can take care of this, right? Well, we came across a few very useful key components that persuaded us to think differently. Here’s why and how you build private clouds with Amazon VPC.

Private subnets

Unlike vanilla EC2 instances, VPC instances are not internet addressable by default. All subnets within VPC are restricted to RFC1918 address allocation. While this means that VPC instances are much more secure by default, it doesn’t mean you can forget about security entirely. It does make setting up firewall rules and security groups a lot easier, because you can configure them to allow traffic from an entire subnet or your entire VPC, without having to worry about IP addresses. The perfect use cases for private instances are application servers, database servers, queues, caches, operations servers, etc. (At Karma we even run the web servers inside VPC, because we deploy Elastic Load Balancer instances to the public subnet and have those connect to the private subnet.)

Internal Elastic Load Balancers

VPC supports internal Elastic Load Balancers. This is a great replacement for custom load balancers like HAProxy and thus means one less moving part to worry about. Setting up an internal load balancer is just as easy as setting up a public one in vanilla EC2, except that the internal load balancer is not publicly addressable.

Advanced Network Access Control

Network access control in VPC is much more advanced. Besides security groups (which are also provided by EC2) there’s the ability to control network access on a subnet level via Network ACLs. In vanilla EC2 only ingress traffic can be controlled, but with VPC you can also control egress traffic. Another nice detail is that VPC allows you to switch security groups on-the-fly for running instances.

Elastic Network Interfaces

It is possible to bind multiple Elastic Network Interfaces to a single instance. This allows you to connect to multiple subnets from the same machine, or perform near-instant network configuration updates without the need to wait for routing table adjustments. Just switch the interface to another instance and you’re done. Another big benefit of using Elastic Network Interfaces is the ability to assign static IP addresses. This can vastly reduce the amount of hassle to deal with when setting up services like ZooKeeper or Redis master-slave configurations.

Want to get started? Here are a few tips:

Understand public vs. private

Instances on VPC are not publicly routable by default. Make sure that public instances, that need to connect to the internet, have an elastic IP address and that you setup NAT instances for private subnets. Beware that currently NAT instances can easily become a single point of failure in your architecture, so if you have high availability requirements you should take care of automated failover.

One other caveat to keep in mind is that external Elastic Load Balancers can only route to public subnets. So even when the instances you want to load balance to are located in the private subnet, be sure to setup the right security groups and deploy ELB into the public subnet.

Setup a secure bastion host

If you’re going with public/private subnets, make sure you set up a proper bastion host in the public subnet to be able to connect to instances in the private subnet. The bastion host should be the only point of entry to instances in your private subnet.

AWS accounts are limited to 5 Elastic IPs by default

Since instances in VPC are not publicly addressable by default you’ll probably need more Elastic IPs than you’re used to in vanilla EC2. Beware that every AWS account has a limit of 5 Elastic IPs by default. Need more than 5? You can request an increase via the AWS website.

Most AWS services require internet access

Don’t forget that other AWS services like S3, SQS, DynamoDB, SNS, the AWS API, etc., have to be accessed via the internet. This means that instances in a private subnet by default will not be able to connect directly. Instead, traffic must be routed via NAT instances.

Experiment!

The beauty of AWS is that it allows you to reconfigure big chunks of your network infrastructure while it’s up and running. This makes it very easy to experiment a lot and find the best solution for your platform. (The guys from Kiip provide an excellent example on their blog

Think about your network topology. Does it make sense for your platform to have some parts hidden from the public? How many public addressable IP addresses do you need for a certain group of instances? How would you like to control network access? (Amazon provides some great example scenarios in the VPC documentation

And finally… Drawing a simple basic model of your network infrastructure can really help in understanding the needs of your product’s architecture and how it could benefit from VPC.

Example VPC Network Diagram

Wouldn’t you want to have your infrastructure look this pretty?