How we built our hybrid Kubernetes platform
Dailymotion’s Kubernetes journey: a timeline of our geo-distributed infrastructure around the world mixing Cloud and on-premises clusters
With the decision to rebuild the core API of Dailymotion 3 years ago, we wanted to suggest a more efficient way to host applications and to facilitate our development and production workflow. We decided to use a container orchestration platform to achieve this goal, and we naturally chose Kubernetes.
Why is it worth building your own Kubernetes platform?
An API put into production in no time, boosted by Google Cloud
Three years back, right after the acquisition of Dailymotion by Vivendi, all the engineering teams were focused on one huge goal, deliver a brand new Dailymotion product.
Based on our analysis of the containers, orchestration solutions and our previous experience, we were convinced that Kubernetes was the right choice of component for us. Several developers in the company already grasped the concept and knew how to use it, which was a huge advantage for our infrastructure transformation. On the infrastructure side, we needed a strong and resilient platform to host these new kinds of cloud-native applications. We preferred to stay on the Cloud at the start of our journey to take our time to build a rock solid on-premise platform. We decided to deploy our applications on Google Kubernetes Engine, even if we knew that we'd use our own data centres sooner or later and adopt a hybrid strategy.
Why did we choose GKE?
We made this choice mainly for technical reasons but also because we needed to provide an infrastructure rapidly to answer Dailymotion's business needs. We had some requirements for the hosting of our applications like geo-distribution, scalability, and resiliency.
As Dailymotion is a video platform available worldwide, we needed to improve user experience by reducing latency. Previously, our API was only available in Paris which was not optimal, we wanted to be able to host our applications in Europe as well as in Asia and the US.
This latency constraint meant that we faced a big challenge regarding the networking design of our platform. While most cloud providers required us to create a single network on each region and interconnect all of them with a VPN or a managed service, Google Cloud allowed us to create a fully routed single network across all our Google regions, which rules in terms of operations and efficiency.
Furthermore, the network and the load balancing service of Google Cloud are pretty incredible. It allows us to use an anycast public IP from each region and let the beauty of the BGP protocol do its job i.e. route our users to the closest cluster. Obviously, in case of failure, the traffic is automatically routed to another region without any human intervention.
Our platform also requires the use of GPUs. Google Cloud allows us to use them in a very efficient way and directly in our Kubernetes clusters.
At that time, the infrastructure team was mostly focused on the legacy stack which is deployed on physical servers, that's why using a managed service (including Kubernetes masters components) was relevant in meeting our deadline and allowing us to take the time to train our teams to operate our on-premises clusters.
All of this made it possible for us to start accepting production traffic on our Google Cloud infrastructure just 6 months after kickoff.
However, despite the overall benefits, the use of a cloud provider comes at a cost, that can escalate depending on your usage. That's why we evaluated each managed services we took in order to internalize them on-premises in the future. In fact, we started to implement our on-premises clusters at the end of 2016 and initiate our hybrid strategy.
Implementing Dailymotion's on-premise containers orchestration platform
Seeing as the whole stack was ready for production and the API still in development, we had time to focus on our on-premises clusters.
Introducing Tartiflette: dailymotion's Open Source GraphQL Implementation for Python 3.6+
This introspection confirmed what we already knew: our desire to geo-distribute our platform and build API-driven web…
With more than 3 billions of videos viewed each month, Dailymotion has had its own Content Delivery Network across the world for years now. Clearly, we wanted to take advantage of this presence and deploy our Kubernetes clusters in our existing data centres.
The entire infrastructure currently represents over 2500 servers across 6 data centres. All of them are configured with Saltstack. We started to prepare all the needed formulas to create either a master node or a worker node but also an Etcd cluster.
The networking part
Our network is a fully routed network. Each server announces its own IP over the network using Exabgp. We compared several network plugins and the only one that fitted our needs because of its Layer 3 approach was Calico. It’s perfectly aligned with the current network design of the infrastructure.
As we wanted to reuse all the existing tools within the infrastructure, the first thing to tackle was plugging a homemade network tool (which is used by all our servers) to announce IP ranges over the network with our Kubernetes nodes. We let Calico assign IP to pods but we didn't and still don’t use it for the BGP sessions with our network equipment. The routing is actually handled by Exabgp which announces the subnets used by Calico. This allows us to reach any pod from our internal network, especially from our load balancers.
How we manage our ingress traffic
In order to route incoming requests to the correct service, we wanted to use Ingress Controllers for its integration with Kubernetes’ ingress resources.
3 years ago, nginx-ingress-controller was the most mature controller and Nginx has been used for years and is well-known for its stability and its performances.
In our design, we decided to host our controllers on dedicated 10Gbps blade servers. Each controller is plugged into the kube-apiserver endpoint of the cluster it belongs to. On these servers, we also use Exabgp to announce public or private IPs. Our network topology allows us to use BGP from these controllers to route all the traffic directly to our pods without using a NodePort service type. That avoids horizontal traffic between our nodes as possible and it increases efficiency.
Now that we've seen how we built our hybrid platform, we can dig into the traffic migration itself.
Migrating our traffic from Google Cloud to Dailymotion’s infrastructure
After almost 2 years of building, benchmarking and fine-tuning, we found ourselves with a complete Kubernetes stack ready to receive a part of our traffic.
Currently, our routing strategy is quite simple but enough to address our use case. On top of our public IPs (both Google Cloud and Dailymotion), we use AWS Route 53 to define policies and bring our end users to the cluster of our choice.
On Google Cloud, it's simple since we use a unique IP for all our clusters and the user is routed to his closest GKE cluster. For ours, we don't use the same technology, thus we had distinct IPs per cluster.
During this migration, we targeted countries to put them progressively on our clusters and analyze the benefits.
As our GKE clusters are configured with custom metrics autoscaling, they scale up/down depending on the incoming traffic.
In nominal mode, all the traffic of a region is routed to our on-premise cluster while the GKE cluster acts as a failover using health-checks provided by Route 53.
Our next steps are to fully automate our routing policies to have an autonomous hybrid strategy that continually enhances our user experience. In terms of benefits, we’ve considerably reduced our Cloud costs and we even improved our API response time. We trust our Cloud platform enough to offload more traffic on it if needed.