Thinking about Observability when Coding

Rohat Şahin
4 min readMar 20, 2021

--

The fact that the modern systems we develop are consistent, scalable, and reliable is usually one of the main priorities of our software development processes.

In this context, we follow some development practices while coding our applications, for example, we have test-driven development and if the tests are successful, we assume that the application works as expected.

If so, can observability be one of the software development processes? Should we evaluate the following questions that we seek answers for most of the time as part of our software development processes and evaluate their solutions within our processes?

Where does the latency occur?
Which hop takes the most processing time?

In this article, we will show you the observability issues when we encounter while developing software.

  • observability to detect latency

Asynchronous programming is very popular in these days software development practices, but it brings some difficulties with it. For example, If we switch context at some points inside our code, It can cause logging and tracking issues and can be more difficult to manage than synchronous programming.

if attention is not paid to change context switching
if attention is paid to change context switching

In the example charts above, since we could not see the part that caused latency after development, we first worked on seeing the problem instead of solving it, and in this way, it saved us a lot of effort to identify the problem, and it also showed that we can solve in a short time.

  • observability to scalability

We can define scalability as the ability to remain healthy in unexpected traffic movements in our software systems and modern deployment systems for example Kubernetes offer autoscale configurations for these unexpected moments.

However, for autoscaling our applications need to respond to these behaviors, for example, we should have a short startup time but sometimes we may not have such a feature.

web transaction time chart when unexpected traffic received

When we first encountered the above chart example, we thought that a part in the system did not work as expected, but the situation we encountered in our detailed examinations was that we encountered unexpected traffic due to too many transactions in a short time period.

We could follow the CPU and memory measurements and create an automatic scale configuration, but the system’s warm-up time may not be enough sufficient.

grafana dashboard to future operations

Could we have predicted this situation using the data? The answer to this question was yes for us, then we created a metric table and monitored our data, after data above our thresholds we determined, created alarms, and planned our scale.

  • observability to availability

Availability means that a system is operational at a given time, although we prioritize accessibility when designing our systems and we can prevent collective crashes, problems with accessibility can critically impact the state of our data.

For example, with a problem in our inventory system, users were adversely affected and the error rate increased in other systems, creating unexpected problems.

transaction count chart

After the problem we experienced, we analyzed our critical codes in the system, created trackers for their transactions, generated system warnings in case of errors, and increased our availability percentage by minimizing the problem.

  • observability to debugging

In modern systems, a request can proceed through many application code subroutine or other software systems, and the common approach about debugging is figuring out where the unpredictable code lives on your system.

In this unpredictable, we aim to consider our requests as a context and to examine the subroutine of the context and their visibility within themselves. For example, the MDC we use in our java developments or the tools use in modern distributed systems such as Jaeger and Zipkin are based on the concept of creating a context for observability.

Conclusion

When we use programming techniques or design models in our application development processes, we aim to make our code more readable for humans and ensure maintainability in our systems.

Observability is the readability of our systems at runtime, in this way, we make it easier to detect malfunctions in our systems and increase their durability for unexpected situations.

Thank you for reading

--

--