Lympo app development update from Head of DevOps and SRE Roman Gorodeckij
Greetings! Roman here, Head of DevOps and Site Reliability Engineer at Lympo. I’m glad to present some recent achievements in Lympo’s technical infrastructure.
During the last couple of months, we have finally developed pipelines, which allows our developers to deliver their code a lot faster and with less risks or struggles than before. Also, we are heavily invested in preparing our infrastructure for production use, and we already have some insights to share with you.
As our CTO Gintautas Kisonas mentioned in the last development update, we want our infrastructure to be decoupled, scalable and secure — these are the main principles. For this reason, we have decided to use the microservices architecture. To host and maintain the microservices, we have chosen Docker containers as it’s the most robust web service hosting solution on the IT market today.
For hosting Docker images, a docker registry and continuous integration services are required. Docker registry allows us to store our crafted docker images. Meanwhile, continuous integration (CI) lets us automate many routine tasks in the background once a developer publishes their code to a GIT repository. CI is most commonly used for testing newly published code and, also, for building and pushing docker images to a Docker registry. Once one of the pipeline jobs fail, a developer is instantly notified via email and Slack, and the pipeline is aborted.
We have chosen GitLab to host CI and a Docker registry, which allows us to host our code and docker images and also use their CI seamlessly.
When Docker containers first became popular, people quickly realized that they needed an automated way to manage the hundreds or even thousands of containers inside a Docker cluster. That’s why we need a Docker orchestrator.
Why not Kubernetes?
The two most popular Docker orchestrators today are Kubernetes and Docker Swarm. While most of the IT industry is using Kubernetes as the de facto solution for orchestrating containers, we have decided to go with Docker Swarm. Why? Docker Swarm’s simplicity allows the whole team to understand the deployment process more easily, even if a developer does not have any knowledge about containers whatsoever.
Luckily, this year Docker introduced a feature which allows to use a Docker Swarm configuration for Kubernetes deployment, so once Lympo becomes bigger and we’ll be more experienced with Docker, we might consider moving to Kubernetes.
About cloud providers
Currently we are hosting Docker Swarm on Digital Ocean (production environment) and Hetzner (staging environment). This allows us to remain cost cautious.
Digital Ocean has a very lightweight dashboard with great features, like a load balancer, spaces (block storage), floating IP’s, backups, snapshots, etc. Hetzner is a less expensive option than Digital Ocean and the only cloud feature which they offer, besides computing instances, is floating IP’s. This makes Hetzner the perfect candidate to use as a staging environment. Both Digital Ocean and Hetzner have a Terraform plugin. Going further, I will explain what Terraform is and why it’s important.
Currently we are using 3 approaches for infrastructure automation:
• Terraform enables us to safely and predictably create, change and improve the configuration of our cloud units.
• Ansible is a very robust and self-explicit automation engine, mostly used for provisioning of cloud instances.
• Docker Compose + Makefile. Thanks to this combination, we can deploy our container configuration into Docker Swarm. We’re using Docker Compose which allows us to store configuration in YAML files for each stack of containers. Makefile, meanwhile, lets us to automate this process even further.
We are currently putting a lot of effort into implementing Continuous Deployment, which will give us fully automated deployment, and this will be done in the near future.
Logging & Monitoring
For monitoring Docker Swarm nodes and running containers, we have decided to use Dynatrace. Dynatrace has a beautiful dashboard interface and it also displays the host map on a whole nother level, compared to similar services.
Our Docker Stack
As you may have already noticed from the diagram, we are using Traefik reverse-proxy. Traefik allows us to specify virtual hosts for each service, straight into Docker Compose configuration file, which is a very robust way to configure virtual hosts for our Docker services. BoltDB is used for Traefik Highly Available setup. Each endpoint is using Let’s Encrypt SSL certificates, so all our incoming connections are encrypted. Traefik requests SSL certificates automatically, once we add a new Docker service to the stack.
GitLab CI runners are hosted in the Staging environment, because with Hetzner, higher resource compute instances cost less than on Digital Ocean.
Portainer allows to deploy new Docker images with just one click, for all our maintainers. This saves time as developers can be fully self-organized and rollout their updates to production without bothering the DevOps guys.
It’s all about SAAS
Our diagram contains many SAAS services. SAAS stands for “Software as a service”. It allows us to focus on deliverability instead of hosting and maintaining those services by ourselves.
• For example, setting up our own MySQL cluster would be very time consuming and heavy in maintenance. So we have decided to use AWS RDS instead.
• For the MongoDB cluster, there’s MongoDB Atlas, a SAAS solution from MongoDB creators themselves.
• Confluent Cloud gives you the whole Apache Kafka cluster out of the box.
• Sentry is used to find all errors which occur in production, in real time, and it also notifies developers immediately via Slack and Email. It even opens new tickets in Jira and Gitlab, which saves even more time for our team.
I believe that for such a short period of time, the development results are really great as the infrastructure becomes more stable and automated each day. We plan to introduce Continuous Deployment in the near feature, which will fully automate our deployment to production. For example, all of our Docker images will be scanned for vulnerabilities with a SonarQube service before deployment. It will also run end-to-end tests once a service is updated and rollback in case the tests will fail. There is also a long journey ahead for implementing a hybrid cloud solution for storing a blockchain database locally and with very tight security.
For now, everything we presented here today is the base for having the fastest deployment possible, progressing towards self-healable production environment and keeping maintenance costs as low as possible, balancing this with a low amount of maintenance hours.
Rapid deployment is the main key to success in every IT company out there. If, by any chance, you are a DevOps person yourself, I invite you to check out this talk about IT performance from SRECon, by Nicole Forsgren and Jez Humble.