Fueling the Rocket for 500 deploys per Week

Published in

Pipedrive R&D Blog

7 min readFeb 3, 2020

*Written by George Javakhidze & Jevgeni Demidov

Intro

Pipedrive provides SaaS CRM software for 90k+ companies around the world. On average, we deploy up to 350 times per day, which includes about 50 deployments to production environments located in 3 different regions. Our developers modify up to 1/3 of our features every week.

Considering our velocity, we need to be flexible enough to be able to deploy and rollback very fast in an automated mode without worrying about being down.

Want to know how we manage this? Continue reading below:

Pipedrive tech stack in a nutshell

Some key characteristics of our architecture:

We use microservices architectural pattern. Our codebase is split across 600+ repositories, while our product consists of 300+ services.
Our services are fully containerized. Only stateful services are run outside of containers.
We rely heavily on Docker to build and ship our services. We use docker images everywhere, starting from the developer’s local machines until our production environment.

As for our CI / CD stack, we use GitHub as SCM and Jenkins (yes, still using it in 2020) as a CI / CD tool. We use codeship to run commit checks.

We have two in-house solutions for powering our CI / CD — a deployment orchestrator called Rakett (Rocket in Estonian) and a custom framework for running pipelines inside Jenkins. It’s fair to say that we use Jenkins more like a job orchestrator engine, rather than a full-blown CI / CD tool.

How do we deploy at Pipedrive?

Code delivery process is the same across all teams in Pipedrive:

Developers make code changes on their machines, changes are committed and pushed to GitHub, and each commit is validated by executing unit tests and verifying code style.

Once the change is ready, the development branch is deployed to an isolated on-demand sandbox environment and all integration tests are executed there.

Finally, developers deploy their changes to production by adding a label to open a Pull request on GitHub. Once the label is attached, our deployment orchestrator (Rakett) takes over.

Rakett builds a docker image, deploys it to our test environment, and executes tests to validate it. Once the image is validated, Rakett merges the PR and proceeds to deploy the image to our production environment.

The whole chain from untested changes to the working code being in production requires two clicks and takes up to 15 minutes on average.

Key Principles

The key to our deployment experience relies on a set of organization principles that we follow. These principles are the foundation of our engineering culture:

Infrastructure and DevOps Tooling Engineers provide various solutions to Developers. Infrastructure engineers are responsible for the low level of the platform, wherever DevOps Tooling engineers work on a higher level.
Both of those teams act as service providers, wherever the developers are the users.
Our services are fully owned by the developer teams, including their deployment and operational aspect.
Developers are responsible for testing and deploying their changes.
Each team in Pipedrive is responsible for maintaining and operating their services.

A good example is our container clusters. Infrastructure engineers bootstrap and maintain the cluster (platform). DevOps tooling engineers provide means to deploy, manage, and troubleshoot services in these clusters and developers deploy their services to the clusters. Each team is responsible for maintaining their set of services independently.

Accepting the failure

There is a famous motto from Facebook founder, Mark Zuckerberg: “Move fast and break things.” At Pipedrive it’s worded a bit differently:

From our point of view, it’s essential to keep in mind that things will get broken and the only question that remains is “When and how much will it break?”. So it’s vital to move fast, as often as possible repeat things that may make you uncomfortable, and don’t be afraid of changes because it helps us to create something new and move forward.

DevOps Engineer? Isn’t DevOps a field?

Above is a quote from Amazon, where the classic version of what DevOps is, is explained. Briefly saying, it’s a combination of Devs (Development) and Ops (Operations) when one person can develop and deploy. So it means that every developer has enough skills to support the whole development cycle.

In theory, this classic/or traditional description of DevOps is rarely used, and Pipedrive is no exception. In our case, we have an implementation of DevOps, which is applied in a way that is most comfortable for us.

We have dedicated Infrastructure Engineers who provide a platform for the rest of the organization. Using this platform, our DevOps Tooling team provides services and support to the rest of the engineering departments by implementing and maintaining various frameworks, tools, and processes, such as:

full-blown CI/CD (testing, development, deployment)
on-demand sandbox environment
microservices platform
testing frameworks
on-call related tooling
monitoring and alerting solution

All these things are required to provide developers everything that’s needed to let them do their job without being worried about testing and delivering changes live.

Benefits of microservices architecture

At Pipedrive, we believe that the key to successfully building and maintaining software is both freedom and use of suitable technologies. Our microservice architecture enables the rapid, frequent, and reliable delivery of complex applications with a comfortable speed.

These are the main points that help us to keep this pace:

Small deliverables: Developers produce small changes, which are frequently pushed. Developing and deploying microservices gives us fast feedback and allows us to perform rapid iterations.
Increased velocity: teams can work independently without affecting the life of each other, and it helps to scale working process a lot
Failure resistant: application is failure resistant because components are isolated and if one of the services is down it won’t break the whole application
Granular scalability: components can be scaled independently, and it won’t cost much money compared to scaling up the entire monolith.

Service Ownership

By moving service ownership to the teams who developed the service itself, we managed to improve the release time of new services built from scratch.

We strongly believe that when a team doesn’t have the option to delegate their problems to somebody else, they will become more educated and responsible for what they do. So at Pipedrive, developers are setting up new microservices from scratch, configuring the deployment pipelines, and preparing configurations for different environments.

Developers are responsible for deploying their changes as well. They have the means to deploy and rollback changes without extra help. In case something goes wrong, the developer is responsible for troubleshooting the issue and rolling back the change, when necessary.

As the tools provided are shared across all teams and centrally managed by the DevOps Tooling team, developers don’t need to re-invent the wheel every time they need to integrate another service.

Who does the testing of changes?

We believe that developers are the best people to test their changes. Therefore, responsibility for testing the change relies on the developer itself.

Our tests are fully automatic, and we don’t have dedicated testers.

As for the test chain:

We run unit tests on each commit, or as the first step of our CI / CD pipeline.
Developers have access to on-demand sandbox environments, where they can bootstrap customized, live-like environments in a couple of minutes. These environments are used for sharing the changes and running integration tests.
Once the change is ready, our orchestrator builds the docker image and executes functional tests.
Validated images are automatically deployed to the test environment, where we run smoke tests as part of our regression testing.
Finally, the orchestrator deploys the image to our live environment and runs a set of live smoke tests.

What if something goes wrong?

Every engineering team in our organization has a rotating on-call shift. Developers participate in on-call, similar to OPS engineers, and maintain their service availability. Developers have access to monitoring and log aggregation tools for easier troubleshooting.

All alerts and issues with the services are forwarded to the on-call person of the owner team. During the shift, an on-call person is available 24/7 and is responsible for solving the issue or delegating to someone else, when needed.

As each team maintains its services independently, it’s vital to facilitate knowledge sharing between them regarding all incidents. We practice blameless post-mortems, which include a high-level summary, root cause analysis, and prevention steps for solving the specific problem.

Developer Environment

While working on something new, everything starts from the developer’s local machine, a.k.a. Developer environment. The environment has a significant influence on the developer’s velocity.

While with Docker, it’s incredibly easy to run containers on the local machine, but this includes other caveats:

Developers need to push their changes rapidly. Therefore, bootstrapping the developer environment should be an automatic, reasonably fast process.
As our services change frequently, the developer environment should always contain the latest version of our services.
Number of services required to run our product and their resource requirements are growing steadily, while our developer machines have fixed, limited resources.
Maintaining a developer environment for those requirements is quite challenging.

But this is already a discussion for next time.