Transitioning Legacy Applications with storage requirements into containers

Published in

Walmart Global Tech Blog

6 min readMay 20, 2020

Transitioning Legacy Applications into containers

photo credit: https://pixabay.com/images/id-449784/

Background
To give a background, our present infra and application setup is on a traditional platform. We typically have java apps deployed in tomcat or jetty and running on virtual machines. Then, we also have these config files or the model files as we call them which reside on an NFS server

The initial step was to create a docker image . We already had the base java and jetty images created. Now, it was just adding the instructions to deploy our app, which included steps to copy the code artifact and the data models into the jetty “webapps” directory.

One of the important objectives here was to leverage the auto-scaling functionality. Primary focus was to keep the container start up time to the minimum to reduce the time container takes to transition into Ready State.
Though not recommended, We still decided to copy the code and data models during the build time and have the container just load them into memory during startup time.

Outcome

A docker image of size 9.8GB was created.
Boot up time was 8mins.

The enormous size of the image wasn’t acceptable by any standards and that directly puts our application out of contention from being containerised and deployed into k8s.

Analysis

We took a step back to analyse if we still can make this happen.
One thing we knew was, it’s the data models that are occupying the space.

Let’s start by first analysing how big are the data files.
We had around 8 of them adding up to 7.8G and then we also had the code artifact of around 180MB.
It was very much obvious that, if we could do something about the data that is inside the image (both code and data files), we could be able to save some space.
Another thing to note , as we copied the traditional design , we are also doing an untar of the compressed configs as well in the build stage

Optimisation

The code artifact of 180MB was moved out to the container bootup stage and we also removed decompression steps of the models as well to the runtime.
The image size was now, 8.63G! and container uptime was now 9mins. Almost a 1.4G of space was freed up.
This wasn’t going to be enough though and we were already being pulled up by the storage team who manage docker images in our nexus repo.

The goal was to cut the size of the image to at least 5G which is the max threshold set by the storage team.

The next Target

As mentioned earlier, data models were around 7.8G, so we had to do something about them.
Having already moved the code artifact to runtime, an idea that stuck was to sort the data models and try moving the smaller sized models to the runtime without adversely impacting the startup time.
With this exercise, we learnt that there were 2 models that were of 3.8G and 2.2G and rest 6 were around 200MB max. Much delighted, all these 6 smaller models were moved to runtime. This helped and the image was now 7.43G.

What Next

Our Focus always were the data models. Lets take a look into whats happening with the models being pulled.
1. The compressed tars were being pulled from our local NFS config store
2. uncompress them at startup stage

The data files were gzip compressed and after moving the smaller data files, we now only had 2 of them in the build stage.

Change the compression Format

XZ though new, is great at compression if you use it with its max capabilities.
Now, we did an untar of the gz archive and then re-compressed it using xz, using the compression option to provide maximum compression, trading it for time to compress.

xz(1) - Linux man page

xz, unxz, xzcat, lzma, unlzma, lzcat - Compress or decompress .xz and .lzma files unxz is equivalent to xz…

linux.die.net

The image size had a drastic reduction from 7.4 to 5.8G !! . The build however increased to over an hour . XZ is slow at compressing but does the job well.
This also had an affect at startup time since, we were doing the untar at deploy time. The start-up time was now 11min which was a bit on the higher side considering, current deployment into VM’s happen in around 9min.

A look at the Docker Image

Inspecting our Dockerfile and image , one thing we noticed was, in our goal to make the image secure, we are using a switching to a non-root user and then were doing a chown on the jetty home directory to this user as the base image was running as root user.
This step was adding an extra 418MB coz of layer overwriting.

We went into the base image and changed the user at base image to a non-root user. This helped bring down the image size to 5.38G.

Continued efforts have bought down the image size from a whopping 9.8G to 5.3G. Though still above the threshold by 340M , At this stage, it looked like the max we could do.

The Game changer

In our pursuit to improve things further, we tried to find the bottleneck in moving the complete data models out of the Image. The main blocker here was accessing the files stored on our private data center. There was a cross DC transfer involved, adding to the latency. So, the next question was, if we could have something similar on the cloud side how will things improve.

As mentioned before, this was a legacy system. So, most of the latest cloud kind of capabilities weren’t always there. Also the cluster was managed by another team and there wasn’t much we could do to have the any kind of volume support added in within our stipulated time to move our apps to production.
The idea here was to try solve this problem from a legacy point of view and be able to transition it to a cloud based approach.
Coming back to the solution in consideration, we wanted to have a file server kind of a setup, Emulating our current NFS based storage . Due to administrative reasons, we could not have our NFS storage mounted in the cloud side machines.

NodeJS to the rescue

Node.js allows to spin up a http server that can host static files over http by following the below steps.

that's it, you are all set. the files should now be available over http at http://<server_hostname>:5050>/filename

Now, we moved the bigger models into this storage and added a step to download them at runtime. All the models now reside at the file server and get pulled up when needed. This is how it looks like at this point

Impact

Image size is now 670MB, we started with 9.8GB
Start-up time is now under 5mins which is much better than the startup time of more than 8mins in virtual machines

I have presented the thought process around optimising the container image size for legacy applications. Hope this will help some of you who are in the process of migrating legacy applications to cloud and looking for some kind of a volume support.

We now have volume support available in our k8s clusters and have raised a request for access to one. With that, the models will be moved to the volume and won’t have to be downloaded at runtime. Thus further bringing down the start-up time