ETPA’s migration to Kubernetes
Introduction
I’ve been working at Energy Trading Platform Amsterdam — ETPA for some years and the idea of moving to the cloud has been in many discussions. Our trading application is based on event sourcing. Certain changes that we wanted to do to improve event flow and performance required major deployments that would take hours or days on traditional server setups.
Apart from the improved deployment strategies, we started looking at our application performance which was hindered by many auxiliary processes to the main business goal of the application. Our main product is trading therefore our main business goal is matching different orders. One of the processes that were slowing down order matching was reporting to third parties. With this observation in mind, we started looking at which third party reporting can be dissected out of the main application. This started to sound more and more like a micro-service approach, I’ll write more about that in another article.
At this point, many roads started leading to Kubernetes clusters for our infrastructure as they would solve many of our concerns about the future of our application.
The VPS Infrastructure
At ETPA we are using the CQRS and the event-sourcing patterns. To enable the use of these patterns some infrastructure is needed namely an event store and a view model to store the projection of the events. Our old infrastructure was made up of the monolith application (ETPA app), an Event Store (implemented using AxonServer), A projection of the events (implemented using MongoDB) & a cache (implemented as Redis).
The VPS Infrastructure was made up of multiple VPS running in servers in Amsterdam running the DBs and applications mentioned beforehand. A load balancer was directing traffic from the internet to the application nodes.
The Kubernetes Cluster
An overview of what is implemented can be observed in the diagram above. We have used publicly available helm charts for Redis and MongoDB. For our application, AxonServer & our trading bot, we have designed our own helm charts. Apart from what is drawn in the image, we have used many other Kubernetes objects and features such as load balancers, secrets, config maps etc. The full Kubernetes design will be discussed in another article in detail.
One concept that we have adopted recently was IAC (Infrastructure as Code), which means that all of our infrastructures are orchestrated and configured using code. Therefore, this code can be version controlled, idempotent, and most importantly tested. We are currently using CloudFormation (AWS) to provision and Ansible to configure infrastructure.
The Data Migration
Since energy trading is possible 24/7, it was essential for us to do the migration with minimal downtime for clients. Apart from implementing the Kubernetes cluster using the IAC concept, we needed to perform data migration with minimal downtime to clients. While also ensuring the integrity and consistency of the data. We used Gitlab CI pipelines to execute multiple steps to migrate the data from the VPS to Kubernetes.
The data migration involved 6 major steps. Firstly, we spun up the Kubernetes Cluster using the IAC & Ansible scripts defined beforehand. Once the cluster was up and running gracefully, we started migrating the Events from our Event store. Migrating events took around 3 hours but we were still able to keep the application running as appending events was still possible. Once the events were migrated, the training application was switched off & three parallel processes started. We performed the migration of the view model, the application state files & the Delta events. The delta events were those events which were persistent but not transferred during the initial event migration. Once all data was migrated we spun up our application on the Kubernetes cluster and opened up the Load Balancer. The Data migration took around 8 hours to complete fully but downtime to clients was minimised to around 3 hours.
Moving Forward
It’s already been 3 weeks since we migrated to our new infrastructure, overall the performance of our system has increased slightly, probably due to the faster private networks in AWS. Overall, we think the migration has been a success.
In the short term, we will be focusing on improving our deployment strategies to minimise downtime for our clients; Scaling up our application instances to improve performance and reliability & improving our Event Sourcing capabilities on the cloud.
With this migration, we have also given more freedom to our developers to start directing our monolith application into different micro-services and release them with minimal effort.
I will continue to publish different articles about the different parts of our journey to the cloud. If you would like to hear about a specific topic please feel free to reach out! ETPA is also hiring! If you would like to be part of our journey please apply for our positions through LinkedIn or by visiting our website.