Infrastructure as a Service at leboncoin

leboncoin tech
leboncoin tech Blog
5 min readOct 25, 2018

by Nicolas Béguier (SRE) & Iyed Bennour (backend developer)

Operating an infrastructure

Operating large data centers that support modern software architectures and development workflows can be a daunting task without the proper tools. Up until now at leboncoin, we relied mainly on vSphere for infrastructure virtualization. But at some point we found ourselves actively working against the tool’s limitations like the lack of IPAM (IP Address Management), no support for DHCP and having to make it work with other tools for PXE. Provisioning a machine using our vSphere based workflows -or lack thereof- was tedious compared to opening up the AWS console and creating an EC2 instance in just a few clicks. The convenience and flexibility of AWS, however, comes at a cost.

We decided vSphere was not the right tool for us to achieve the level of efficiency, automation, flexibility and scalability we needed to support another transformation underway at leboncoin: the adoption of DevOps practices that goes with the migration to a microservice architecture but also our IT systems evolution in general.

In this post we give an overview of the infrastructure underpinning leboncoin.fr and how we used the OpenNebula platform to achieve our infrastructure operational goals and build our IaaS.

An Overview of leboncoin Infrastructure

At leboncoin, the infrastructure is composed of web servers, database servers (mainly PostgreSQL), our own DNS servers, Kafka clusters, load balancers, caches and our API gateway. The list goes on. All these pieces of infrastructure are critical to the smooth operation of our services.

For application workloads, we recently started to use Kubernetes to run and scale our microservices. But we still have a sizable proportion of legacy: business critical applications that are not containerized and are run on Debian or Centos Virtual Machines.

Moreover, all these pieces of infrastructure are distributed over multiple data centers, at least three, which provides redundancy but High Availability cannot be reasonably achieved without an efficient provisioning workflow.

Overall, it is quite a complex environment and any failure in one of these components could cause a service outage and impact our users, but also our internal SLAs negatively.

Our mission as engineers at leboncoin is to manage this complexity and ensure High Availability of our services to keep our users happy. In the remainder of this post, we describe how we are doing that — at the infrastructure level — with the help of OpenNebula.

Enter OpenNebula

“ OpenNebula is a cloud computing platform for managing heterogeneous distributed data center infrastructures. “

Wikipedia

There are plenty of open source and proprietary data center virtualization solutions. We have been using vSphere for a long time and we tried some alternatives like OpenStack, CloudStack, Mist.io, Ganeti, etc.

We could write an entire post of the pros and cons of each of those solutions, but in the end we have selected OpenNebula.

The OpenNebula platform comes with integration, management, scalability, security and accounting features for distributed data center virtualization.

One of the nicest feature of OpenNebula is KVM-based virtualization, a stable and well-supported open source technology.
There are also wrappers for various languages (Python, Golang, Ansible, Terraform) of the OpenNebula command line interface and API. It is therefore very easy to automate actions such as VM migrations or automatic resizing of VM resources.

OpenNebula community is also particularly active and new features are coming out regularly.

This script checks the scheduling of a service and helps the migration into other physical hosts.

Output of the previous script

We also appreciate the stability of OpenNebula, which is a must-have for production use. Moreover, the OpenNebula WebUI offers a great user experience which makes its adoption by the team easier. More importantly, it saves us time on support.

OpenNebula WebUI Dashboard

High-Availability with the help of OpenNebula

We wanted to organize our virtual infrastructure into two environments, production and staging.

OpenNebula allows to manage geographically distributed data centers and abstract them into a virtual cluster, which we separated into environment called Zone in OpenNebula: production and staging.

Using OpenNebula Web UI Sunstone, we can create virtual machines on demand, using VM templates.

For example, this is how the “my_service” microservice from the example above is provisioned :

  • We select two virtual machine templates: infra_dc1_prod and infra_dc2_prod: infrastructure service in data center 1 and data center 2, in production zone.
  • We choose the number of instances: for each instance, a pair of virtual machines (defined by the templates selected in the first step) will be created and split over the two clusters.
  • We set “my_service” as the service name and allocate resources: CPU, Memory and Disk

The virtual machines are created and deployed on both data centers transparently. This is as convenient and easy as using AWS EC2. Plus, the fact that we have one instance in each virtual cluster, which is the same concept of an Availability Zone in AWS.

Once the virtual machines for our “my_service” service are provisioned, we use a configuration management tool, Puppet in our case, to declare these virtual machines as “my_service”. At startup, each machine registers itself automatically into the PuppetDB via puppet- agent.

HAProxy, which is a load-balancer and High Availability reverse proxy, reads the load-balancing configuration defined in the PuppetDB. When the service starts, HAProxy adds it to “my_service” pool and start proxying request to it.

Conclusion

In this post, we gave an overview of the infrastructure at leboncoin and the challenges of operating an infrastructure at our relatively large scale. We also described OpenNebula, the solution that allowed us to build and effectively manage our own IaaS. We finally described how simple it is to run Highly Available services with this kind of infrastructure.

If you enjoyed this story, please recommend and share to help others find it! Feel free to leave a comment below.

--

--