Are you familiar with monitoring as a code, and how eDreams ODIGEO applies it?
For the last few months, the monitoring team has been developing a strategic change to include automation and versioning in all the infrastructure to help developers see their applications’ metrics. As a result, now we can manage our monitoring in the same way we manage the apps, the servers and any other components, using syntax in order to describe the monitoring tools, frameworks, what to monitor, when to alert about it and much more. This means that monitoring settings, processes and rules can now be versioned, shared and reused.
The ability to integrate each part of the monitoring infrastructure and to define the main GKPIs (these are the KPIs that are set as essential by default on the first deploy process of a new app) within the deploy process of an application is a game changer for the monitoring team. Now, when our developers deploy a new application, automatically they have their own brand new Prometheus and Victoria metrics up and running, ready to collect metrics. On top of that, we can also deploy the golden KPIs (errors and exceptions, Response Time, number of calls…) in our APM and generate the dynamic alerts based on the application trend in no time.
We could have stopped here, but instead, we’ve gone one step further and have integrated two new features: automatic dashboards with our GKPIs (dashboards as a code) and the alerting to the Slack team channels — and we haven’t forgotten to document all of it in a Confluence page where you will be able to find both, the procedures to follow in case of an incident, and a list of dependencies for each application.
Stay tuned because we have some more cool things coming up in this regard like weekly automatic reports, more autogenerated Confluence page content, integration with automatic InfluxDB/VictoriaMetics stack deployment… and much, much more.
And you? What would you expect from monitoring as a code?