MIGRATING LEGACY APP TO AWS IN 5 DIFFERENT WAYS

Published in

VRT Digital Products

12 min readJan 10, 2022

TL;DR

Don’t feel like reading or you want to follow along as the diagrams are constructed step-by-step, then feel free to watch the accompanying video on YouTube.

INTRODUCTION

Learning AWS at this time will provide you with a very applicable skill at many businesses. It is with that mindset that I set out to acquire some “real world” hands-on experience with the platform, as part of my preparation for my next professional project at the VRT. Of course I’d also enjoy sharing this experience with others, that’s why I created this blog and, for those that don’t like reading, an accompanying YouTube video as well. You can find the video embedded at the top of this page.

A great way to get experienced with, well basically anything, is to practice. My original idea was to deploy my website to aws, similarly to how I’ve done that on the Google Cloud Platform previously. But the matter of fact is, my website is just too simple. It’s completely stateless, already nicely containerized and doesn’t require any database backend. That just wouldn’t do, so I started looking for something less “ideal”. Eventually I rediscovered an old side-project of mine, which will act as the perfect legacy application. We’ll come back to why it’s “legacy” and what it does in a minute, first let’s outline the rest of the exercise. In this case, I decided to deploy our legacy application multiple times, utilizing a different approach each time. In total we’ll be deploying the application 5 times using the following services: EC2 (so plain VM’s), ECS (as a container), EKS (a container, but on Kubernetes), EB (beanstalk deployment, we’ll be cheating a bit here by not actually deploying our full application, but instead offer some insights on how we can migrate from the old legacy code to something more modern) and finally using the API gateway with lambda integrations (same principles will apply here as with the EB deployment).

To make things even more interesting, I decided to impose some additional rules on myself. These include that I am required to use the CLI for every deployment interaction (it’s okay for me to use the GUI for demo purposes and as an overview for feedback). Additionally, all my work needs to be easily reproducible, as you would be expected to do in a real production environment. To achieve these goals, I’ll be heavily leveraging the Cloud Formation service, which takes “templates”, text documents, as input and proceeds with creating the actual resources in the context of a “stack”.

Let’s now introduce our application. It’s a simple 3 tier application. All the way in the back we have our simple relational database. To this database connects a server application, this, together with the database, is what we’ll try to migrate to AWS. The server is in itself statefull, which as we’re going to find out is going to be a problem for availability and consistency. Interactions with our server happen through a client. This client can also be referred to as a thick client, meaning that it does more than simple presentation. The extent of this is luckily fairly limited, it’s biggest problem is the lack of portability, it isn’t a webapp as you have so often nowadays, and it has a problematic shared, together with the server, dependency to a third component, which describes shared objects and serialization. This coupling is something that would make updates fairly cumbersome. Luckily for us, we’re not doing any upgrades, in fact, I would like to keep any code changes to an absolute minimum, we’re just going to do deployments in this new environment.

The purpose of the application is to provide a catalog, an overview if you will, of a collection of collectible cards, Pokemon cards in this case. It offers several features like browsing the catalog organized by set, and visually indicating which cards are owned. You can also “zoom in” on a card, displaying additional information like typing, hit points, moves, abilities, and so on. The application has a search function, although somewhat unfinished. It only supports searching by name and filtering on whether the cards should be owned or not. Finally there is a “draft” mechanic, which sets up a lobby on the server to which multiple clients can connect. Once the, fully configurable, draft has started, all players will go through several rounds, selecting cards as they go. Since the draft is server managed, it is guaranteed to be consistent. It is impossible for players to select the same card if only one physical copy would exist. The draft is also fair in that every player will be given the same percentage of rare cards to common or uncommon cards, at the same intervals.

So why do I call this application legacy, what makes an app “legacy”. In this case, the reasons are threefold:

The codebase is relatively old, it was written somewhere near the end of my university career (roughly 4 to 5 years ago at the time of writing). No one really remembers how it works anymore.
It uses some dated principles, either because it’s old, or because I simply didn’t know otherwise at the time of writing. Probably the biggest example of this, is the thick client, instead of providing a more flexible web API.
And finally… it is written in Java, need I say more?

FOUNDATION

So, with all of the introductions out of the way, it’s time to move on to the part you’re actually here for: Let’s start migrating! To kick things off, we’re going to create a solid shared foundation, for the other solutions to build upon. Many of the solutions can reuse big parts, primarily the networking. So to avoid having to repeat ourselves, we’ll separate this part in it’s own stack. Due to its very nature, it will be a bit of a random collection of elements, but for our purposes, that’s fine.

First we’ll create a S3 bucket that will provide all our file storage needs. In it, we’ll store the binary of our server as well as zip files for our lambda functions later on.

For networking, we’ll create a full featured VPC, or virtual private cloud. We’ll give it the CIDR 10.0.0.0/16, which is fairly default and general, but perfect for this demo. In the VPC we’ll create two private subnets (10.0.1.0/24 and 10.0.2.0/24) as well as two public subnets (10.0.11.0/24 and 10.0.12.0/24 respectively). As best practice, the subnets are divided over multiple availability zones. To turn our public subnets “public”, we connect them to an internet gateway attached to our VPC via their routing tables. We also create a routing table for the private subnets, for later, but they can remain “empty”. To round out the networking part, we also provide two security groups, one for access to and from public resources, we’ll leave this one completely open for convenience, and one to access our database resources in the private subnet. Speaking of the database, lets’ go ahead and create that one as well within one of our private subnets. We’ll be using a RDS instance compatible with the snapshot of the database taken (I must admit I did cheat a bit here, there exists no obvious way to seed the database from a sql file, while it was probably possible through a lambda function, I instead took the easy route and spun up a temporary EC2 instance to seed the initial database and then took a snapshot for future reference), luckily the old mysql version is still supported. We’re only creating a single instance for now. In a production environment, you’ll likely want multiple instances to satisfy any availability requirements. The final element of our shared stack is a container repository, to which we will be able to publish our container images.

SOLUTION 1: PLAIN EC2

The first solution could be very straightforward. Spin up an EC2 instance in our public subnet, download the server jar and run it. This would however not be very scalable down the line and skips over many of the EC2 features. So instead I decided to go the auto scaling route, which seems to be the much more preferred route. For this you’ll need a launch template, specifying which EC2 settings need to be used as well as some initial user data (which is a script that runs at boot). Next we have the auto scaling group which specifies how many instances you desire from the template. If you want this to scale dynamically, you’d also need an auto scaling policy, but I did decide to skip this part. In order for our clients to reach both instances efficiently, we’ll also throw in a load balancer. Since we’re not dealing with an http app, we must use a network load balancer. Notice that while we are scaled up, the internal logic and statefulness of our server isn’t designed for this case. Any lobby hosted for the drafting feature on one server will be completely invisible to the other, which is very inconvenient. I’d also like to remind you that injecting secrets like this is also bad practice. Some secret manager of the cloud provider or another service should be used instead to inject these credentials.

SOLUTION 2: ELASTIC CONTAINER SERVICE

To run our application as a container, we first need to package it as such and make it available for any ECS cluster on AWS. For this we’ll reuse the ECR, or elastic container repository, we made earlier. An ECS cluster on its own won’t do much, it just specifies a hook or group our later tasks can belong to. For this exercise I decided to make use of the serverless Fargate service, as we already did EC2 for the previous solution. In order to run our containers we first create a task definition, specifying the image, cpu and memory usage, port mappings and some environment variables (our DB credentials in this case). Next up is the service. The service links the task to the ECS cluster and specifies how many instances of the containers we want. It’s very similar to our auto scaling group from the previous solution in that regard. We’re going to skip attaching it to a load balancer for now and instead assign it a public IP directly.

SOLUTION 3: ELASTIC KUBERNETES SERVICE

EKS, or elastic kubernetes service, will also revolve around the use of containers, but in a Kubernetes environment, instead of a specific AWS environment. We won’t be creating our own cloud formation template for this and instead let eksctl, the cli tool to interact with EKS, do the heavy lifting. When setting up a new cluster, it will create its own stacks based on templates that are far superior to anything I could make right now. After deploying the cluster we end up with a Kubernetes environment in its own VPC. In order to gain access to our database we will need to set up a VPC peering connection between the two VPCs. Once they are bridged, we can configure the routing table of the EKS cluster to route to the DB VPC and vice versa. Updating our private subnet routing table is sufficient, since we’re only interested in resources deployed in our private subnet. With the cluster available, we can deploy our server. For this, I created a simple helm chart. Helm is the de facto standard “package” manager of Kubernetes, which can manage a collection of resources. In this case we have our deployment, which instantiates a pod with our server image, and a load balancer service. Since the Kubernetes cluster is deployed in AWS, AWS handles the load balancer behind the scenes (which is a breath of fresh air coming from someone who is primarily used to on premises Kubernetes deployments).

SOLUTION 4: ELASTIC BEANSTALK

Next up is elastic beanstalk. This AWS service is promoted as being the fastest way to get your application in the cloud, and they’re not lying. For this solution I used the eb CLI tool, and launching my app was as easy as creating a new EB deployment in the directory containing my sources and we were off to the races. One big catch however, is that it won’t as easily support my old, cumbersome, legacy application. Instead, I provided this simple Node.js app, which provides a subset of the functionality in the form of a REST API. In this case, listing all the cards in the catalog and listing some additional information on a single card, if requested by ID. Clearly this is significantly more work and would require a full rewrite to be done in practice, but porting a big legacy app in its entirety to the cloud is something I wouldn’t recommend in the first place. Instead, this approach could offer development teams an easy, low barrier, way to create new, smaller, micro services that offer subsets of functionality of the larger monolith. That way, a business could plot out a more long term migration plan, slowly replacing the old with the new. Final side note I’d like to mention, I didn’t notice any support to automatically attach ssl to your deployment (even though the nginx setup is automatic and included). I find this to be quite disappointing given the otherwise excellent ease of use. I feel that it is somewhat dangerous to offer such a simple approach to deploying webapps, while at the same time not being secure by default. This is something that they should definitely improve.

SOLUTION 5: API GATEWAY + LAMBDA

Our final solution will be based on the API gateway with lambda functions as the backend implementation. As the name suggests, we will again not be using our old legacy code and instead reuse parts of our previous Node.js application to implement the lambda function handlers. In this example I decided to create two separate functions, one for each API route. This is likely not the best solution for this use case, but that’s what I wanted to experiment with, so that’s what I did. Creating the code for the lambda functions is pretty straightforward, especially since we already figured out most of it in the previous exercise. I created a simple makefile to package the code in zip archives and we’ll deploy these to our S3 bucket. Then we can define our functions in our cloud formation template. We’ll also define the API gateway itself, and while it looks very simple on the drawing, it is actually a collection of many different resources. We have the gateway itself, then two integrations (AWS_PROXY in this case as we’re doing lambda integrations) one for each lambda function, routes linking the API endpoints to their respective integrations, a stage to deploy our API to and finally a set of permissions, allowing our gateway to launch our lambda functions. With all that deployed, we can navigate to our API endpoint and retrieve our results.

FINAL REMARKS / IMPRESSIONS

While the intention of myself was to learn as much as possible about using AWS in practice, I did skip over several interesting services. Honorable mentions are Route53 for applying proper DNS names to our endpoints, CloudWatch to have decent logging, any form of secret management to handle our DB credentials and CloudFront for content distribution to the edge. While playing around with those services would undoubtedly have been interesting, I did limit myself to a predefined scope and I’m already quite happy as-is with the many different approaches I did tackle.

As for my impressions of AWS. I did notice the very big focus on connecting these different components/building blocks (I find their blocky design for their service logos even better suited now). Figuring out which blocks you need isn’t always super obvious, thankfully, the cloud formation documentation was quite solid. Could definitely have been better at times (I often stumbled on missing, unmentioned requirements depending on the context), but as far as I can see, everything is there at least, so with a bit of reading you can get there in the end.

With that said, I thank you for reading (or watching if you clicked the YouTube link). I hope you found it interesting.

Alright, cya!