Tips when working with monitoring tools

Jorge Alvaro Bartolome
TUI MM Engineering Center
4 min readMar 16, 2020

We’ve all had that moment during our daily work when we run into a problem and you think:

I would bet someone has had the same problem before

I’m here to collect those moments when I didn’t find that someone and I had to figure it out on my own. I hope these tips will be helpful to you in the future.

Take profit of the full potential of the tags

I have created thousands of metrics with different names and now I can’t manage them effectively

We must avoid the creation of metric names for each application and service with different names. If there is a possibility to group the metrics in a single name and separate them by tags we must do it.

Instead of:

- application_A.method_A.http_responde_code
- application_B.method_C.http_responde_code

We should do the next changes:

- http_responde_code (Tags => Application: A, Method: A)
- http_responde_code (Tags => Application: B, Method: C)

This will allows us to have more flexibility when creating alert monitors and dashboards. For example, we will be able to create a single monitor based on a tag instead of creating 30 different monitors.

Before creating new integrations you should check if there is already an integration that meet your needs

I’ve been working on a metric integration for over four weeks and I can’t get it done

A very common mistake is to try to do your own integration to monitor a technology/service and you haven’t checked if there is already an integration that already does that task. It doesn’t mean that your idea is bad or won’t work for you. You will probably do a great job, but it will take extra time and effort. If an integration already exists and it suits your needs, just use it, you will save yourself a lot of work, effort and tears.

Avoid the hype

I’m using an incredible new feature that my monitoring tool offers but I just realized that nobody pays attention to it and also the costs have increased a lot

Every day new and better features appear in all the monitoring tools on the market. For example, new application performance monitoring (APM) tools.

APMs allows you to see the messages that travel between services, add additional information to these messages, track the performance of your applications and much more. People’s first thought is:

We are going to use APM in all our applications! Let’s go!

  • Do you really need that level of granularity in all your services? For example, having thousands of traces of APM from a maintenance CRUD usually doesn’t make much sense.
  • Have you considered the cost overruns this can generate? Usually these types of services are associated with a cost overrun in the monitoring tools.
  • Is anyone checking all this information? It is very nice to have all the detail of the requests/responses, their performance and the level of errors… but if nobody takes advantage of this information, it is totally useless.

Keep it simple

A month ago I made a beautiful dashboard with 20 widgets and 30 graphics about average times per minute. But I stopped using it because I spent too much time looking for what I want

We should create the monitors and dashboard in the simplest possible way and with the least number of visual elements. The excess of unnecessary information makes us lose focus and interest in the things that really matter.

No data, no party

I have one of the best monitoring tools but I have no idea if the logic of my services is working properly

There is no point in having the best monitoring tool in the world if you don’t have metrics to work with. We must coordinate with the rest of the work teams to get the most useful information from the applications we work with. You don’t have to understand what functionality each service of the system performs. But you can teach other teams (who do know the functionality of those services) how to monitor them.

Priority vs Severity

I have so many alerting messages in my mailbox and now I don’t know which ones are important and which ones are not

Alert messages should be all important, if you are getting warnings that are not really important you should ask yourself the following questions:

  • Should that alert exist?
  • Should I categorize my alerts by the level of priority and severity?
  • The alert is not important to me. Is the alert important to other people?

It is very important that we are clear about the following points when setting up your alerting:

  • Define your priority and severity level. We must not mix up the terms: severity and priority. I leave you a link to an article where they explain quite clearly their difference.
  • Define who should receive notifications and who should not. Basically, I think there are two types of people who should receive error messages: Those who can act to fix them and those who must be informed because they are responsible and must ensure that it works properly.

At the end of the day, being subscribed to alert messages that do not affect us makes us lose focus on the ones that really affect us.

Conclusion

Some examples may seem very simple and obvious, but trust me, sooner or later you will fall into one or the other. So remember, these tips applied at the right time can make our lives much easier.

--

--