Running Hank on AWS
A key piece of LiveRamp’s data-distribution infrastructure has always been our open-source key-value data store Hank. Hank powers a number of key applications at LiveRamp which require real-time data lookups. These include requests to our pixel servers for identifier syncing, client-side data distribution, and our real-time data pipelines. Over the past few months, we have been working on porting this real-time data stack to AWS.
This post outlines
- Why we chose to build AWS support for our own key-value stack instead of leveraging commercial solutions
- The development stack we use for AWS development
- Hank’s design on AWS (and what we had to change)
While LiveRamp does have a number of services on AWS, all of our client-side data distribution has been served from our colocation facility. Our reasons for wanting to support an AWS stack are not particularly exotic:
- Redundancy — we would like to be able to fall back to other regions in the event of downtime at our Colocation facility (or in an AWS region)
- Expansion (+EU) — latency and local data practices make it impractical to serve data from US data centers. Spinning up key-value infrastructure in a new region should be quick and easy.
- Scalability — an AWS presence gives us the ability to quickly scale up resources with increases in pixel traffic.
A few rough stats on the scale of our key-value stores:
- 20 active “tables”
- 19.8 TB data (pre-replication)
- 400B records across all domains
- Average daily peak ~350k QPS (up to 500k during heavy traffic)
- Latency 1.4ms @ 90th percentile
Before expanding to AWS we considered migrating to a number of other technologies. While our original reasons for developing Hank in-house rather than using other key-value databases were straightforward — that is, none of them were mature at the time we built Hank — LiveRamp still has several hard requirements that make it difficult to transparently swap in other solutions:
- Massive batch writes — while most key-value datastores are built to support both random writes and random reads, random writes have not been an important use-case. Since the vast majority of our data is processed by MapReduce jobs, we found it cheaper and more efficient to do full store re-writes frequently.
- Very large key-spaces — most key-value datastores are optimized for wide rows with moderate-sized keyspaces which can fit in memory. This is not the case at LiveRamp — given our focus on identifier syncing, we have over 400B keys, which makes otherwise attractive solutions (ex, Aerospike) outrageously expensive.
- Deterministic latency requirements — since LiveRamp is invoked as a pixel request on partner websites, it’s very important that our response latency does not follow a long-tail distribution. By performing batch writes, Hank ensures that every request involves max 2 disk lookups.
After investigating a number of options including Aerospike, Cassandra, DynamoDB), and after consulting AWS solution architects, everyone agreed it would be easiest and most cost-effective to spin up our existing Hank infrastructure on AWS. This way we also don’t need to support separate KV stacks.
AWS Code Infrastructure at LiveRamp
LiveRamp’s standard workflow for provisioning AWS resources is a Chef->Packer->Terraform workflow, and we used the same pipeline to provision our Hank cluster on AWS:
- Chef is a powerful configuration management tool we use to manage our internal infrastructure. For Hank, this mostly involves package installation, configuration file management, and code deploys.
- After changes are pushed to a Chef cookbook, Jenkins converges this cookbook against an Amazon instance. If successful, Packer builds the newly-configured instance into a new AMI
- When we want to use these AMIs to spin up a Hank cluster in a new region, we use Terraform to provision AWS resources using these AMIs. Terraform also handles the networking in new regions (subnets, security groups, route53 records) and makes it easy to run identical staging and production environments.
Hank + AWS
For a more comprehensive overview of Hank’s design, see the overview on GitHub. Below is an overall diagram of our AWS Hank setup, broken down by component below:
The important components of the Hank cluster we spin up in AWS are:
- Hank partition servers which serve data to client processes
- S3 buckets to transfer data from our colocation facility and AWS
- an Apache ZooKeeper ensemble which holds all of the Hank metadata
- a UI for configuration and monitoring
Hank Partition Servers
We had a few choices for what AWS resources to provision for our actual hank data servers. Our only hard requirement was heavy low-latency disk query volume.
- AWS SSD instances — IO-optimized instances (i2 instances) were the obvious first place to look. i2 instances provide high IO throughput, but cost significantly more per hour than standard m4 instances.
- M4 + EBS — an alternative approach was to provision standard m4 instances optimized for EBS, and use SSD EBS volumes for data persistence. Within EBS, there are two choices — gp2 and io1 (standard vs provisioned IOPS).
Our physical hardware has a CPU/Mem/Storage ratio of 1 core / 4 GB / 160 GB, so we priced out similar configurations using 1 year reserved instances:
Not surprisingly, gp2 volumes were the most cost effective option, if we could guarantee that
- EBS latency was reliably low (we need low single-digit ms). Amazon is reluctant to quote hard numbers here, so this required testing
- gp2 volumes will give us sufficient IOPS to power Hank
As it turns out, our measured EBS latency is well below 5ms, making i2 volumes unnecessary. The IOPS calculations are a bit more complicated, but gp2 volumes will also have sufficient dedicated IOPS due to the large provisioned volume size. This is great news, since the cheapest option works perfectly well for us!
Hank replication is built around the concept of a Ring, a Ring contains a full copy of the data for a domain. We divide Rings in a Hank cluster across Availability Zones for two reasons:
- By holding a full replica of the data in each zone, we are resilient to downtime in an AZ
- Since our pixel servers are already split across AZs (for the above reason), we minimize cross-AZ data transfer costs by querying AZ-local data servers.
Our current workflow builds Hank bases on the Hadoop cluster in our Colocation facility (in the future, we may move this builder to EMR) and transfers them to AWS via S3, with partition servers pulling data directly from S3 to local disks. S3 storage provides cheap persistent storage, so we can easily scale up Hank servers as needed. To minimize data download costs, we have separate buckets per AWS region
Spinning up a zookeeper ensemble on AWS was straightforward using Chef. To be resilient to downtime within an Availability Zone (AZ) in an AWS region, we split the zookeeper servers across several AZs (ZooKeeper background traffic from Hank adds only trivial data transfer costs). While ZooKeeper can’t update metadata without a 51% quorum, it will continue to serve data, which is enough to avoid downtime if an AZ goes down.
For security purposes we use a bastion host to restrict access to our Amazon VPCs. Since our domain builder still runs in our Colo, we provide a proxy ZK gateway via SSH tunnel so that the AWS and Colo instances use the same ZooKeeper ensemble.
Last but not least, we want to be able to access Hank’s web interface for monitoring and configuration. Similarly to how we provide remote ZooKeeper access via a SSH tunnel, we access the Hank UI via a proxy machine.
While we ran into very few large hurdles running Hank on AWS, we had to make a few code changes to get Hank running and optimized on AWS.
- Hank previously assumed that both Partition Servers and Domain Builders referenced data on the same file system (specifically, HDFS). However, by splitting builders and servers into different environments, we needed to split the data locations into two properties, while not breaking our Colo cluster. This was resolved in PR #315
- Hank previously had no concept of data locality — Hank assumed that all replicas of a record could be reached with the same latency and cost. However, since Hank partition servers are now split across AZs, we want clients to prefer replicas on servers within an AZ to minimize both latency and data transfer costs. Support for this was added in PR #31
- Similarly, we wanted to make sure hank updates never took down all replicas of a record within an AZ. This was resolved in PR #319
One nice surprise was that S3 support for Partition Servers required almost no changes — the Hadoop S3 FileSystem lets us transparently pull data from S3 instead of HDFS.
Conclusions and Future Development
Overall, setting up the infrastructure to migrate Hank to AWS was a remarkably painless process. While the AWS provisioning tools we used (Packer, Terraform, and Chef) are still rough around the edges, we didn’t have any major blockers.
As we scale the volume of our AWS traffic, we’ll continue to build out Hank features as necessary. Features we are considering involve more automatic latency monitoring and scaling. We plan on updating any changes here to help anyone going through a similar migration.