MongoDB in AWS? Does it really work?

Robert Fehrmann
Snagajob Engineering
5 min readMay 30, 2016

--

Snagajob has been using MongoDB as their OLTP DB engine for several years and based on stability, performance, uptime, ease of use, … you name it …, our experience had been excellent. Our production systems (incl. MongoDB) were running on a 100 % virtual environment backed by a high performance SAN. We knew the strength and limitations of our environment and we knew that we had to refresh our infrastructure every couple of years to stay current.

But refreshing infrastructure is a very involved process. You want to make sure that your new platform is performing well enough so that it lasts for the next couple of years. After all, it take quite some time to order physical infrastructure and get it provisioned. But you don’t want do over-provision either because infrastructure has a shelf-life. Use it or loose it.

So two years ago we asked ourselves if there was a better option. Could we run Snagajob in a public cloud and scale our infrastructure on-demand as the business grows? Could we increase consistency via Infrastructure as a Service (IaaS) and treat infrastructure as code? And could we do all of that with the same or better security, stability, performance, and cost? The only way to tell was to try it.

Our biggest concern was the database tier. On the database tier we have 9 MongoDB clusters with 15 replica sets, each replica set implemented via 3 members. That’s a total of 45 data nodes. All MongoDB instances were running on VMs with 2–8 vCPU, 8–40 GB RAM, and 200 GB — 1 TB data volumes.

If we wanted to run MongoDB in AWS (and for that matter all of Snagajob), how would be get started? How do we map our current VM resources into EC2 instance types (machine sizing, disk options, HA) ? How do we test performance in a comparable way? And what’s the merit to the funny cat-video or noisy neighbor myth, i.e. how consistent is performance? To make a long story short, does it really work?

To answer these questions, let’s take them one a time.

Mapping on-site instance resources to EC2 instances types

To get a first estimate for our EC2 instance sizes we used Copperegg , an free tool from Idera . The Copperegg report gave us a good starting point, not only for our MongoDB infrastructure but also for all other machines as well.

Interestingly enough, about half of the machines suggested by Copperegg for MongoDB were T2.Medium. T2.Medium is a great instance type, but it had one major flaw for our use case. Back mid 2015, T2.Medium didn’t support EBS encryption, a must-have option for us.

The smallest (cheapest) instance type supporting EBS encryption were M4.Large. So that became our default choice for small clusters and M4.XLarge (for bigger clusters). However, an M4.Large provides already so much compute power that in particular in our non production environments, they would be virtually idle most of the time and thereby waste a lot of money. To mitigate that aspect we decided to run MongoDB on docker on 3 M4.XLarge in non-production (15 instance of MongoDB per M4.XLarge), which is a much more economical option.

In terms of storage, we decided to run our initial test on GP2 SSD which solved the inconsistent performance problem on magnetic disks as well as the high cost of PIOPS. They are an excellent middle-of-the-road option.

Last but not least, to provide adequate high availability, we decided to run all members for a single replica set in completely different Availability Zones. This way, a failure of a single AZ would still leave a majority of members up and therefore keep the application running. Nonetheless, we didn’t want to make any concessions on transactional safety or performance, i.e. transactions still would have to be confirmed by at least one other node.

Load Testing

Now that we had decided on our initial testing configuration, there was just a slight problem. How do you simulate enough load?

We quickly dismissed the option to create a comparable environment and building scripts to somehow simulate the production load as too costly and too complicated. Instead we decided to take a different route. We had an extensive amount of metrics in Cloud Manager providing numbers for peak read/write operation counts. Using a generic load testing tool we baselined both, our on-prem environment as well as our proposed AWS environment. As long as the actual counts from Cloud Manager were below the on-prem baseline and the on-prem baseline was below the AWS numbers, we knew we were OK.

iibench and POCDriver are 2 excellent open source tools to create different types of workloads. Using these tools quickly showed that even the very small configurations of M4.Large with encrypted GP2 EBS volumes and replica sets in different AZs still provided close to the maximum throughout compared to our on-prem baseline. And if necessary we still could scale up.

Consistency

Now we just had to confirm consistency. After all, who hasn’t heard about the horror stories of inconsistent performance in AWS when a new funny-cat video was released or Friday nights, when half of the nation is streaming Netflix. Interestingly enough, we are unable to identify any correlation between throughput and time of the day or day of the week. In fact, we couldn’t measure any performance variations at all and performance on EC2 as well as EBS was even more consistent than on our on-prem infrastructure.

Conclusion

As of early June 2016 we have been running MongoDB in AWS for 3 month. By all measures we had considered in the beginning (performance, consistency, reliability, flexibility) results are nothing short of remarkable. Running MongoDB in multiple AZs was definitely the right choice. We have seen a couple of issues in AWS — all related to single AZs — but due to our architecture, none of these issues impacted performance or availability of MongoDB at all.

Another good choice was to stick to LVM for disk management (in particular increasing the size of a disk). Even though we are not using LVM snapshots for backups (but EBS snapshots instead), LVM provides an easy path to increase disk space of data volumes. The old way to call up your friendly TechOps operator to increase the size of a SAN volume just doesn’t working anymore.

And last but not least, if you are starting new MongoDB in AWS or just want to validate and/or review your setup, check out this excellent white paper on running MongoDB in AWS .

Originally published at engineering.snagajob.com on May 30, 2016.

--

--