Migrating to AWS — What we Learned
Last year we migrated our entire infrastructure to AWS. Now that the dust has settled, I thought I’d share some insights on the migration process and what we learned.
Benefits We’ve Noticed Since Migrating to AWS
We’ve noticed a number of benefits since migrating to AWS, these are some of the key ones:
Our uptime has improved significantly since the migration. We have had the same 3rd party uptime monitoring software running since before the migration, and the reported weekly uptime improved significantly as soon as we migrated.
- Peace of mind
While more difficult to quantify, the extra peace of mind provided by knowing that we’re running on state-of-the-art infrastructure, and all single points of failure have been eliminated, is significant.
The ability to setup a replicated version of our infrastructure to serve Australian clients locally demonstrated the power IaaS flexibility to us.
- Ability to innovate
Being on AWS opens up an array of new services and technologies that are now significantly more accessible to us. Whether it’s looking at new storage engines like Redshift or services like Amazon Machine Learning or Lambda, the time to implement — and therefore innovate — is significantly reduced.
- Integration to other systems
A large part of what we do at Ezora involved integrating with other systems. Being on AWS opens up new possibilities in terms of how we integrate and the tools & services we can utilise to connect to other systems.
Being in complete control of our infrastructure has significant benefits for us. Before, we always felt one step removed from what was actually happening, and were always reliant on other people to make certain things happen. That dependency & delay is now gone.
Our ability to handle scale has changed dramatically. Whether is scaling our data layer within RDS, or scaling out our application layer horizontally, our infrastructure is now built to handle this.
All of the components in our new infrastructure have been designed to be highly available and redundant. There are no single points of failure, and all data and services are spread over at least 2 independent geographical locations.
Obviously we had a few problems along the way. Here’s a list of the key ones, hopefully you can avoid them if you’re planning your own migration:
- Elastic Beanstalk
As previously discussed, Elastic Beanstalk wasn’t really flexible enough for our needs. The way it handled deployments in particular just didn’t give us enough control over what happened during the process.
- Insert performance on write heavy applications
One interesting issue we had was with the performance of a large volume of inserts. As part of our application, we import data from other sources. To profile this, we setup a sample data import that processed and performed around 3m transactions. On Aurora it took around 7 times as long to complete with the default parameter sets.
- After a lot of testing, we finally narrowed it down to the parameter “innodb_flush_log_at_trx_commit”. Turning this off dropped the time on Aurora to around twice what it was on our old infrastructure.
- Query indexing
We noticed some unusual differences in the way indexes were chosen when we migrated to Aurora. Initially we thought this was Aurora-specific behaviour, but after further investigation we found that it actually came down to changes in the query optimiser between MySQL 5.5 and 5.6. For us, the answer was to use FORCE INDEX in a few specific places.
- Aurora Patching & Failover — patching does happen occasionally (every month or two), and will result in a restart of your instance. Apparently this is something they’re working on at AWS, but currently there is no workaround — it won’t failover to a replicated Aurora instance if you’re running one for example.
- EC2 Swap Space
One small problem we ran into initially was that swap space was not enabled by default on EC2 instances running Amazon Linux. This is something we hadn’t allowed for, so we had to update our base-AMIs to include swap space.
- Backup Policies
This was less of a problem, but just something we had to figure out. The backup options for RDS are a little confusing. At a base-level, you can just turn on automatic nightly snapshots and you have instant cover for a 35-day period, with incremental rollback to any point in time. While this is very easy to implement and a great start, there are a few limitations to it.
- Our backup policies and SLA required longer than a 35-day retention period.
- The snapshot is at an OS-level, so you have to restore the whole thing. There is no option to restore an individual database for example.
- Similarly, if you wanted access to the SQL for a particular database or table, you would have to restore an entire snapshot first, then export the data you needed.
- RDS snapshots are also somewhat limited… you can’t archive them to S3 or Glacier, for example.
In order to get around these limitations we implemented our own backup solution on top of the automated snapshots. This involved taking nightly SQL dumps for all clients, encrypting them and storing them on S3 with specific retention periods set.
Finally, some recommendations if you’re going to take on your own migration to IaaS:
- Version & automate everything — if you can, do as little as possible on the console. Script everything and version control it. Try to think of your infrastructure as another codebase, and treat it accordingly.
- Choose your implementation partner carefully.
- Make your application as fault-tolerant as possible — plan for failure.
- Make the most of IaaS — use the available services, decouple your components, build in elasticity and try to make your product as cloud-ready as possible.
- Plan your actual migration process carefully.
This article is an excerpt from a more detailed post on ezora.com. Ezora is a Cloud BI product that delivers Financial Control, drives Business Performance and supports Strategic Decision-Making.
For more information read the full article, or feel free to get in touch if you have any questions.