How monitoring applications can affect engineering decisions

Camila Brendel
Inloco Tech Blog
Published in
7 min readJan 31, 2019

A practical overview of the benefits of monitoring your applications

In the world of software development, we are in a constant dilemma between delivering fast and delivering perfectly. With or without deadlines, we should always be concerned with value delivery. At In Loco, we live by principles that allow us to move fast, having the necessary autonomy to take risks and ownership over everything we do.

During development, we must make technical decisions discussing the tradeoffs of every possible approach, and often there is not enough time to develop the “perfect” solution, given our need to ship fast. Either way, we could use some kind of support to follow up with our solutions and to serve as input to our decision process in the evolution of our code.

Monitoring

Monitoring is not really a brand new feature for developers, however it can be left out if it doesn’t make the cut in the priorities. Or even if the metrics exist, but are never used — for alerting or even just to be looked at from time to time. However, the cost of building a monitoring infrastructure pays for itself, as well as spreading a monitoring culture, which I’ll show later in this article.

There are many ways to build a monitoring infrastructure, with well established tools and databases, such as Grafana and Prometheus. Cassandra and Postgres, for instance, offer a good amount of performance metrics that helps notifying when something is not functional. Managed cloud services offer their own monitoring tools that are highly integrated with their other services, like AWS CloudWatch.

All this tooling makes it simple to start your own monitoring infrastructure, allowing us to setup alerts and structure a basic on-call process. However, it also gives us the opportunity to go beyond that.

When it comes to our own applications, we don’t have metrics out-of-the-box — although they can be easily added with well established APMs (Application Performance Monitoring) tools, like Dripstat and New Relic. Just a few annotations here, some configurations there, and you’re all set. They’re easy to use and offer a great deal of information on a nice interface.

Although APMs can offer easy integration with fast setup and great return, it might not be financially affordable to support in every part of the system. In that situation, you ought to take a different approach and create the metrics yourself. That might seem like a lot of work, but once you have an infrastructure for collecting and visualizing the metrics, making your application export the metrics you create is quite simple.

Creating your own metrics is an interesting experience. You will be free to define what you want to know, since you have absolute control of what will be exported. As you know what the application is about, you can think of interesting metrics that maybe even an APM wouldn’t show.

On the other hand, it comes with a lot of responsibility! You need to be aware of what are the critical parts of the application that need to be metrified. Missing or misdefining a metric can lead to misleading conclusions and cause even more confusion. Besides, you should analyze if your solution can support a performance overhead due to the use of these types of tools.

At In Loco we chose Prometheus and Grafana as part of our telemetry infrastructure. When it comes to exporting metrics from within your application though, each team is free to choose the framework that fits best. We’ve had experiences with Netflix’s Servo (now deprecated) and Spectator, Micrometer and Prometheus’ own client libraries.

Now, imagine the following scenario:

We have a simple server that handles requests with the method foo, that receives two parameters and will only process that request if both parameters aren’t null. Here’s a simple example in Java:

You can see that at first we don’t know much about the data going through this application once it starts running. Maybe it would be interesting to know how many requests it is receiving… Let’s do that! We’ll be using Prometheus’ client library for Java.

Once you run your application you can access http://localhost:1234/ and you will see the metric we created and JVM metrics that were exported with the DefaultExports.

That gives us a general view of the usage of this server, but we still don’t know how many requests are actually being processed. Since we already have the metrics server running, we just need to add the new metrics:

As expected, once we run this code, all of our metrics will be available for us if we access http://localhost:1234/

Now we’ll be able to know how this endpoint is being used and even think of other insights we could have with these and other metrics. You can check out a full example here!

But how can it improve my services development?

Once you have meaningful metrics about your applications, the impact that they can have on the next steps of your work is huge.

There are a handful of advantages that you can achieve with a proper metrics infrastructure:

“Birdbox”, Netflix, 2018
  • Detecting bottlenecks and technical debts. If your application has an overall bad performance, without metrics you are left without visibility. You can guess where the problem is and start tackling it, but what if it ends up not having any effect? You have just wasted time in something that won’t make a difference.
  • Prioritize tasks based on impact. Once you know what (and where) the actual problems are, you will have enough evidence to decide whether or not it needs to be solved or improved right away. Besides, you will already know how it is going to positively affect the application once it’s done.
  • Learn more about your application. These types of metrics let you have an inside look in your application, knowing where it usually takes longer and what parts of the code can have, for example, degraded performance during peeks of requests. This can be incredibly helpful during outages, because you need to minimize the problem as fast as you can: The more you know about how your application works, the better. As mentioned in Site Reliability Engineering, building observability is one of the main ways to increase the speed of troubleshooting.
  • Insights. Metrics will also provide insights about the data that is going through your application, making you notice things that you wouldn’t otherwise.

These aspects can improve the progress of the application development, specially in early stages, and possibly make it smoother and faster, becoming more and more stable.

Adding metrics to your applications can bring many benefits far beyond what has been shown in this post. Still, if you’re adding the metrics yourself, be careful not to get carried away and add too many metrics. As the expected behaviour of the application has been already mapped out, you should use this knowledge to add meaningful metrics only. You don’t want to pollute your code and make it harder to understand with a bunch of metrics laying around. Moreover, you don’t want metrics that will take the focus away from the real problems.

Besides, metrics aren’t the only tool you can use to improve the understanding of your application. As Daniel Berman said in his post about tracing: Tracing, logs, and metrics form the ultimate telemetry solution. You should use the tools that will satisfy your needs.

“Avengers — Infinity War”, Marvel, 2018

At In Loco, we’ve had lots of situations where our metrics saved us from focusing energy on tasks that wouldn’t make a difference, and that is critical to our development as a company that needs to move fast. Our monitoring culture is widespread across the teams, as we make it a part of our daily life and decision making process.

Are you interested?

If you are interested in building context-aware products through location, check out our opportunities. Also, we’d love to hear from you! Leave a comment and let us know what you would like us to talk about in the upcoming posts.

--

--

Camila Brendel
Inloco Tech Blog

I write about introversion, teamwork, books, finding meaning and other things… Also software engineer during some hours of the day