Automated Observability in Pipedrive — the What, the Why, the How?

Published in

Pipedrive R&D Blog

5 min readFeb 8, 2022

Growth in the commercial context is usually positive: growth in revenue, growth in engineering power, growth in mindset, etc. However, sometimes growth can be challenging. In this article, I would like to take you on a journey and show how we are working to bridge the gap between observability automation and developer experience in observability tooling.

Previously, I explained how Pipedrive goes “from deployment to actionable insights (in under a minute).” This time, I explain how Pipedrive moves “from actionable insights to automation in observability.”

Currently (beginning of 2022), Pipedrive’s engineering force consists of 400+ employees, 20+ teams/tribes and countless initiatives carried out daily.

A while back, when our engineering department wasn’t all that big, we would assist our engineers with training sessions. We’d help them set up dashboards in Grafana, improve their NRQL skills in NewRelic and help them master PromQL. These engineers could then spread the knowledge to their respective teams. “Train the trainer model,” if you will. Today, however, we are facing new challenges related to the training of new employees, most of whom are juniors who have never seen or heard about NRQL or PromQL. Given the current complexity level, the learning process of our new team members is limited to tools, processes and training within product engineering.

For quite some time, we’ve been running onboarding sessions too, at the very least, familiarize newcomers with tools that we use at Pipedrive. For the current setup, we use NewRelic for application performance monitoring and tracing, Zabbix for infrastructure, Prometheus for microservice monitoring, Graylog for log management and Grafana for consolidating all these sources into one single view. As you can probably imagine, this list can be overwhelming for any newcomer. And that’s only one onboarding session.
My colleagues have contributed a few articles about our onboarding process, and I encourage you to read them. In particular, how we manage to turn juniors into mission-ready Pipedrivers in a month.

Back to our challenge: Our goal was to solve the chaos. We had a blooming idea that no developer (or product engineer) should spend excessive time generating graphs, dashboards and even metrics for their own services.

So, what did we do?

Pipedrive started using Fastify as a de-facto NodeJS framework, which allowed us to instrument plugins for Fastify to cater to the metrics of any service that uses it. With that in mind, we created a set of basic metrics for each service and all the dependencies they are using, such as Kafka, RabbitMQ, MySQL, etc. Since all services are monitored by us, the observability team, we are familiar with the metrics that we are collecting and could rearrange them as a set of dashboards or panels for engineers that need to gather insights about their services.

This solution would have been partially functional. Yet, it would have also resulted in large amounts of metrics in our Prometheus TSDB and wouldn’t have been used to its max capacity. Another idea we had was building a solution from scratch.

Introducing GraDaTor (or, to put it in human terms, Grafana dashboard creator), a simple tool that lets anyone create dashboards from a simple UI. For example, assume you have a service called WWW-API built with the Fastify framework. All of your HTTP calls have been auto-instrumented by Fastify’s plugins, and your data is being collected as soon as the service is in the live environment and accepting requests. Now, as an engineer, you’d like to know how many requests are healthy, which ones reply with 4xx, etc. You just need to head out to Gradator and select your service from a dropdown menu, confirm your service uses an HTTP server and push a button to generate a dashboard straight to Grafana. You can do all this without writing a single line of code for observability or manually creating a dashboard and writing the queries.

What does the workflow look like?

First, we select a service from our predefined list, which is automatically rendered based on Pipedrive’s services.

Next, we can select a template for our service. We have a predefined set of templates so that developers don’t need to worry about PromQL, metrics or standardized ways of monitoring one’s service.

Add blank dashboard & basic variables or more detailed service health, KPI’s and deployment annotations.

Once we have selected our service and a template, we can add more metrics to the dashboard to gather more contextual information about the service’s well-being. We have a collection of out-of-the-box, standardized Fastify metrics, allowing us to easily understand any service’s health status, as they have the same set of metrics. As an essential part, we wanted the users to be able to control assets like variables and panel groups, allowing them to customize them on the fly.

Select additional metrics based on your service, if it uses HTTP, Kafka, MySQL, etc.

Now, the user could customize their dashboards by editing the panels and moving them around.

Finally, it is time to give the dashboard a meaningful name and push it into Grafana so that everyone can see and use it.

What’s next?

Our GraDaTor plans are to define the extra sets of key metrics and standardize how services are monitored by adding clean and sleek dashboards. This way, we could serve a 400+N engineering organization with a standardized set of metrics and a unified understanding of observability.

Interested in working in Pipedrive?

We’re currently hiring for several different positions in several different countries/cities.

Take a look and see if something suits you

Positions include:

Infrastructure Engineer
Quality Engineer
Software Engineer in DevOps Tooling
Junior Developer
Lead Engineer
And several more