The great migration: a start-up story

Published in

Simply Wall St

4 min readApr 26, 2022

Simply Wall St has been growing consistently for the last years. For any SaaS B2C business, engineering is a key foundational part of the business: most of the time you only feel it when it doesn’t work. Engineering must be an enabler to keep supporting the company's growth. We like to follow what we call the 3X mindset: any decision we take needs to keep supporting SWS 2–3X growth on a yearly basis for the next years.

We want to think around 3 main pillars when we talk about Simply Wall St’s growth:

User growth. Increase in traffic and usage of the platform. Application performance is key for attracting and retaining customers. Traffic may increase progressively or be triggered by a specific event that may generate peaks of traffic. We just need to be ready for those situations.
Business growth. We have multiple functions that are supported by Engineering. We need to scale to support the increase in business needs and requirements.
Team growth. Engineering team grows as well. So we need to keep adjusting our practices and dynamics as we scale our team.

OK, what about the migration

A year ago, we were going through accelerated growth in terms of user acquisition and hiring. These two issues combined put a lot of pressure on our existing infrastructure which was hosted on IBM Cloud. The platform was struggling to keep up. We were facing daily slowdowns, p95 latency was in the “seconds”, and constant external bot scraping was putting extra pressure on existing resources.

We attempted to scale our platform in IBM Cloud but we hit numerous limitations and challenges, the main one being that we were running some of our infrastructure on bare metal servers. Some more details of the pain points:

we were responsible for most of the maintenance work. We were spending time in infra instead of solving user problems.
we were not able to scale horizontally without a lot of engineering support. Some events would trigger unexpected loads which would require manual intervention to keep the platform running.
the environment had been handcrafted by previous teams and was not designed in a way that would scale with our needs.
we were running on IBM Classic. So, we could move to a more modern version in IBM Cloud but that would require a full migration.

Taking into account that the team would need to run a full migration work, we started the discussion of defining what the new solution would look like if we were to keep our 3X mindset.

The principles

We defined some principles for the decision

Dream big, ship small, learn fast.

We are going to define our north star and we are going to get there incrementally.
Minor to none user disruption.
No data loss. We are going to move tons of data

Standing on the shoulders of giants

Prioritise fully managed services vs own infrastructure/services
Build on a platform that enables SWS to succeed for the next 5–10 years with a 3X mindset. This should allow us to access great talent out there

Zero Trust

Migration should improve the security of our platform and reduce our exposure to future attacks.
Data should be secure during the overall process

Decision and plan

Rather than looking at scaling and improving our infrastructure on IBM Cloud Classic or migrating to IBM Cloud (non-classic), we decided it was time for a big move from IBM Cloud to AWS. We determined that AWS was better suited for our scaling needs as they are proven to be the leaders in the cloud space according to Gartner and more well known in the industry for cloud computing. It was a big shift in strategic direction for us.

The migration would include migrating our Kubernetes clusters, caches, queues, functions, databases, object storage, and more from IBM to AWS. It was a big ambitious engineering goal for a small engineering function.

When we first discussed the task at hand, the task was daunting, to say the least, and we ended up breaking the migration into 3 distinct phases: migrate a vendor database, migrate Kubernetes + uplift; migrate our application database. We called it “The great migration”.

The outcome

The end result was described as one of the smoothest migrations experience by various senior engineers, with p95 latency dropping from seconds to 200–300ms on our main API and reaching 99.9%+ availability. Note: this was not ALL achieved at the end of the migration, but progress was made in small chunks throughout the migration.

We will be publishing a series of blog posts to go into detail on how we did it, so if you are embarking on a similar journey, hopefully, you will be able to learn from our experience. The topics we’ll be covering are:

How to set up a site to site VPN and networking with AWS and IBM Cloud
How to migrate Kubernetes clusters from IBM to AWS
How to migrate to RDS Aurora PgSQL
How to set up your applications in preparation for database migration
How to plan & communicate a big migration in your company

Stay tuned for our migration blog series!

The great migration: a start-up story

OK, what about the migration

The principles

Decision and plan

The outcome

Written by Dan Tan