One of the big engineering projects we have at Kudos is an Epic called Cattle not Pets. The goal of this Epic is to move from our Pet servers which are hosted in Amazon’s EC2 to Cattle services in Google Kubernetes Engine.
This blog post is going to cover some of the tasks that we have performed in order to get closer to our goal.
We are going to cover how we:
- Moved our Ruby application on Amazon EC2 instances to GKE Kubernetes Pods.
- Moved from AWS Application Load Balancer (ALB) to Istio.
- Moved our AWS Aurora MySQL database to Google Cloud SQL.
First a little background.
Our platform is split between AWS and Google Cloud with a VPN tunnel between the two sites. This enables us to route traffic to our Kubernetes cluster in Google from our ALB in Amazon.
We have a monolithic ruby application that is running on EC2 instances in an Auto Scaling Group for deployment and an ALB in front of them for SSL termination and routing of traffic between services.
The first thing we needed to do was to come up with a plan for migrating our ruby application and MySQL database over to Google Cloud.
We had three major components to move over to Google Cloud:
- Load Balancers.
- Ruby application servers.
- MySQL Database.
Moving at least one of these to Google whilst the others remained in AWS would cause latency as requests would need to traverse the VPN or public internet, therefore we wanted to minimise the amount of time we were running with the application split across the two providers.
Our database is one of the biggest costs we have in our infrastructure so we wanted to make sure that we were not running large replicas of this database side by side for too long.
It looked like we had two options:
- Move the Load Balancer and ruby application servers first and connect to the database across the VPN.
- Move the database first and reconfigure the ruby application servers to use the new Cloud SQL instance.
We decided to go with the first option as migrating the ruby application servers to Kubernetes would be cheap, simplify the deployment and would be easily reproducible. It would also allow us to perform a blue/green switchover when the time came, giving us a viable rollback mechanism.
This would however increase the latency of calls to the MySQL database. By how much, we weren’t sure, but we could start work on moving the ruby application server without disrupting any production traffic and would keep the cost down as we are already running a Kubernetes cluster.
Moving from EC2 to Kubernetes
So we started to work on converting our ruby application into a Kubernetes deployment. We had already done some work to run a development version of the ruby application in a docker container. However this was running with plaintext HTTP and using the Padrino web server so it would not be suitable for a production deployment.
The first thing we had to do was to put a full fledged web server in the Docker container and use that to serve the application. We chose to use the passenger Docker image as our base image and use the nginx server that is bundled with that container to serve our application. Then we added our SSL certificates to our child image and customised the nginx configuration to match our production environment.
As we were moving from a VM to a docker container, we needed to reassess our logging for the application. On the EC2 instances we were using the Google Stackdriver Logging agent to ship our logs to Stackdriver from files scattered around the local file system. We wanted to do the same with the new setup, however by default Kubernetes only captures STDOUT of the ENTRYPOINT in the docker container and uses that for logging. The monolith application however writes multiple logs in different locations.
Therefore we decided to add a logging sidecar alongside the ruby application in the Kubernetes pod to read the logs. We put the logs in a Kubernetes volume and mounted that volume into both of the containers. This allowed the logging sidecar to ship those logs to the StackDriver API.
Once we had the Docker container sorted we were quickly able to create some Kubernetes configuration for the application and mount all the secrets we needed into the container.
With the application running in Kubernetes we were able to connect the new Kubernetes version of the application servers to the production database in AWS via the VPN and test to make sure that everything was working as expected. This also gave us a good indication of what the latency would be like during the switchover.
Application Load Balancer to Istio
We started looking at replacing our ALB with a Google Load Balancer for our Kubernetes cluster and then using Istio to route traffic to the pods, which would give us the benefit of a global load balancer and CDN in front of all the services in our Kubernetes cluster. Unfortunately one of our requirements was the need to redirect HTTP to HTTPS traffic at the load balancer, and currently the Google HTTP Load Balancer does not support doing this.
Therefore we decided to use the TCP load balancer that is created with Istio and use Istio to do the SSL termination, HTTP to HTTPS redirect and request routing. This turned out to be pretty trivial as we were already using Istio for request routing in other parts of the platform. We just needed to add some more Virtual Services objects and update the Gateway and we were able to use a single Istio Ingress Gateway to do all the routing.
Testing the setup
Now that we had the ruby application in Kubernetes and we were able to route traffic to the service using Istio, we could test the application as if it were live. However, we did not switch production yet...
This was down to the latency issue. We validated that we were able to connect to the database and that the application was running as intended by creating a test domain and testing the application in isolation from our production traffic.
We left the Kubernetes ruby application and the load balancer alone until we were ready to switch, something we planned to do this only once we were confident that we were able to move the database.
After this work, we were able to run our monolithic ruby application in our Kubernetes cluster.
We did not want to switch to this new version until we were sure we could move the database over to Google Cloud SQL without any issues.
We will continue with this migration journey in our next blog post, where we will show how we managed to move our database to Google Cloud.
Kudos is an award winning, cloud-based platform, through which researchers can accelerate and broaden the positive impact of their research in the world. Find out more.