Moving a monolith to AWS (Part I)

Andrea Lettieri
6 min readJun 19, 2018

--

http://www.artofmtg.com/art/grim-monolith-mtgo-promo/

Last year, the company that I’m working for decided to embrace “cloud” tecnologies. We had to choose between Microsoft Azure and Amazon Web Services, and AWS was the winner. We started to investigate and learn the particularies of this cloud environment. We took part of the re:invent 2017, which was an AMAZING experience, and it really opened our eyes.

A part I of this story can be read at my co-worker’s article, which is more focused on the infrastructure, network and services that we choose for the migration. In this article, I’m going to talk about the architecture decisions that we took to move a monolith to the AWS environment, and the plans we have to upgrade the code to make it “modern”.

A bit of background: I’m the architect of a custom CMS, which is coded in .NET Framework, in C#, using Microsoft SQL as the database layer. This product started in 2007, so as you can see (and imagine!), the architecture and decisions were VERY DIFFERENT back then. The product has been online since 2008, supporting thousands of sites, and millions of hits. We know it’s old code, but still got some air on it. It’s very customizable, we can extend it however we need/want, so it’s very easy for us to spin up new intranet, public sites or whatever we need.

When we started to think what we could move to the cloud-like environments, we found 3 pillars which were very clear points in which we could take advantage of the cloud

  • Binary files, uploaded to our Library — a repository where the images, documents and other binary-files are uploaded. This was stored in the database, as binary fields.
  • Search engine — we were using Lucene for the search engine
  • Reports — the reports were storing the temporary information locally (PDF, Html, images, were all stored in a temporary directory in the web server)
  • Bonus: CSS and image files related to the look and feel of the site — we were storing these in the local folder.

Binary Files

This was the biggest challenge, without a single doubt. We had over 60 gigabytes of binary data inside the SQL database, and we had to move it somewhere else.

The perfect storage for binary information is, of course, S3 buckets. We generated the buckets, and configured the permissions acordingly, using IAM.

At first we wanted to use signed-urls, but that could cause problems with the cache, because the URLs all had a querystring parameter, and when a image contains a querystring parameter, some browsers (not giving names, but you know WHO YOU ARE) will ignore the cache headers and just re-request the image all the time. That could be problematic for the transfer rate. So we decided to go with regular URLs, and we’re considering adding a Cloudfront in front of it. For now, we went without the CDN. We are using a internal handler (an ashx), so if we want to change something, we can do it “behind the curtains” without re-parsing or re-creating all the HTML in the pages.

Now that we know we want to put things in an S3 bucket, how do we do it? We can’t migrate everything the same day as the migration: it’s too much info, and takes several hours to update. If you are located in the USA, you can contact Amazon to use the Snowball service: they send you a hard drive with a network plug, you put it on your network, and you export your information there. After you’re done, you contact them again, they pick it up, and they put it on a S3 bucket for you. This is a GREAT service if you have to move several terabytes of information. In our case, the libraries were not THAT big (less than 1 terabyte combined), so we created a custom migrator, which read the binary data from the field, and uploaded it to S3. We ran this on the AWS network, so the upload was pretty fast.
Our custom code allowed us to perform the migration in batch: we first imported 10 files, checked that everything was working as expected, then we imported 100, checked again, and when were sure, we imported the rest. The program was prepared run with lots of parameters (IE, “just import this customer’s library” “import just the files that were edited before X date” or “import just X files”). This gave us control of the QA, and also control of the final migration.

One thing that you need to be aware when you are migrating is that you should never delete the source data until you are ABSOLUTELY SURE that everything was migrated correctly.

Deleting is the last step that you perform in a migration.

In our case, we created a temporary SQL table, which stored the relation between the file id and the S3 key that was uploaded. This small change in the database allowed us to perform the migration while the application was still live, without interrupting the normal usage, and without having to do a release with a patch or change of code.

Right before “making the final switch” to the AWS server, we run the migration again, to migrate anything that changed, and then we remove the binaries from the SQL.

Search engine

In our application, we were using local Lucene indexes, which were located in the file system of each server, and automatically synchronized. This worked fine, until we started to read about scaling on the cloud: if you want to perform auto-scalling (adding servers dynamically under certain parameters [for example, if CPU is above 80% for 5 mins, add 3 servers to the pool], each server needs to be pristine. This means that you can’t write ANYTHING on the file system, because you might be browsing a site on server1, and then after clicking a link on a page, you are in server65. You can’t asume that the file system is the same on server1 as in server65.

At first, we wanted to implement a Elastic File System, which is like a shared hard drive that is accessible through all the servers. But we had a rough time with that: EFS is only available on Linux servers. We run Windows, so we had to find another solution.

Amazon offers some application-as-service. What does this mean? it means that they run the service itself, and you just pay for what you use. You don’t have to have a virtual machine running something: you can just use what AWS provides (usually an endpoint). I could talk all day about the offered services, but in the case of the search, we went and pick up Elastic Search as a service. We don’t want to run Elastic Search on our own servers in a Virtual Machine, we don’t need ANOTHER server to administer, patch and handle. So we’ll just let the heavy work on Amazon’s hands, and we just use the API to index the documents and perform the search!

The first thing we had to so is build a custom tool to re-index all the sites. This was slow, because it means that we have to initialize the index for all the sites. We took a similar aproach that the library: we loaded everything initially, and we update the different indexes as days passed by. This allowed us to have all the sites in the ES index before the final migration. When we do the cutover, we changed the source of data (instead of local Lucene, we went to ES), and it worked perfectly.

We did, however, had a small issue with the initial index: while the search is not something that is widelly used, the initial Elastic was running on a T2.Micro, and those instances uses CPU Credits. It means that, they are very very cheap (and small), but if a process uses a lot of CPU, you’ll run out of credits, and it will start to fail. We got lots of 503 errors, and we had to investigate what happened. Once we discovered that it was related to this, we quickly update it to a T2.Medium, and the problem was fixed. We had to reindex everything, tho, because changing the instance type lost the index. Something to keep in mind!

I’ll continue the series in the Part II, with Reports, CSS and, as a bonus track, some serverless candy using AWS Lambda and Simple Queue Service :) Stay tunned!

--

--