Observability 101: What you see is what you get | Tetrate
One of the most repeated pieces of advice for anyone getting started with microservices is to make sure you can see everything that’s going on inside your services. Leverage the power of observability. However, observability is a loaded term — so it’s valuable to understand what that terms mean, and what’s involved.
This blog is good for beginners, intermediates or anyone who’s had a bit of a brain fart and needs a refresher course. It will tell you what an event is, the difference between observability tools and what you can do with them in an environment that’s supported by an Istio and Envoy service mesh.
Metrics, logs, and tracing
In its most basic sense, observability is best explained as the ability to understand your system using external outputs of metrics, logs and tracing that have been generated based on events from within your system.
In a microservices-based environment, you’re generating a lot of events, and an event is defined as anything that happens from the moment that a request reaches the outer perimeter of your network, the action that generates observable data.
Metrics, logs and tracing are all ways to understand those events and by extension, the inner workings of your services.
Observability in Istio service mesh
If you’re using microservices to any great extent, or you have a hybrid environment, you are more than likely considering or using a service mesh.
Istio and Envoy produce a significant amount of data for service behavior, in metrics, logs and tracing — which is great! You’ll need to make some judgement calls to decide which data are critical or important and which really aren’t so that you don’t drown in the data you collect or pay to store data you don’t need.
Metrics
Generally speaking, there are two types of metrics. Application/Business metrics and Operational metrics.
The sole concern of a service mesh, such as Istio, is gathering those operational metrics to help you ascertain how our services are performing, and getting a general understanding of service health.
Proxy-level metrics
Envoy generates its own statistics as well as access logs and tracing, more information on which can be found. Envoy observability will tell you about its configuration and health at a very granular level, giving you a picture of the relationship between service instances and Envoy resources.
Many Envoy observability insights are gathered automatically, so Istio takes the immense capabilities of Envoy observability, and adds in a further control to give you the ability to select which metrics are generated and gathered at each proxy. Also, if you need to expand the reach, you can scale up and back as needed!
Service-level metrics
Covering the four basic service observability needs for service-to-service communication, service-level metrics give you insight into your latency, traffic, errors and saturation (saturation is the total number of requests per second that your service can handle, and how close you are to meeting that threshold).
Control plane metrics
These are the metrics that Istio gathers on itself! (Yes, the description really is this short.)
Logs
Access logs give users the ability to see and understand behavior from a single instance’s point of view. Istio’s logging capabilities are enabled by Envoy and they’re customizable, which means that there is complete control over what is gathered, and how.
Traces
Istio’s tracing capabilities are, again, enabled by the Envoy proxies automatically, and give users the ability to see a request flow through the mesh in real-time, to understand any sources of latency that may be causing downstream dependency failures or other unintended outcomes. The traces are sent to a distributed tracing backend, such as Zipkin, that can act as a collection and lookup to provide an all-round easier experience for reviewing distributed traces. Istio’s role in this is again to extend greater control over the tracing capabilities by giving users the ability to dictate the volume of tracing carried out, based on what they deem necessary.
The value of observability
The further you move away from a monolith the more costly and confusing your observability can become if it’s not set up properly. Good observability means:
- You spend less on storage (genuinely, it will impact your financial outgoings).
- Engineering can actually see what is going on in your systems! Ambiguity helps no one in this instance.
- If you are going through a migration process, you’ll find a lot of issues you may not have known existed, giving you ample opportunity to implement solid processes to prevent future problems.
- You’ll find the badly written code. It won’t rewrite it for you, but it will highlight where changes need to be made.
- It will embed in your engineering teams a culture of care.
- You’ll be able to see very easily the features that are getting the most use, and focus attention on improvements to where customers are spending the most time, before they even realise it.
The list could go on, but the point is that Observability tools do far more than help debug, they are your eyes and ears into a distributed system that needs more oversight to be successful.
Additional resources:
- Observability three ways: Adrian Cole
- Metrics, Tracing and Logging: Peter Bourgon
- Video: Observability ‘Ask Me Anything’ Learning Bytes
Tia Louden is a content writer for Tetrate. Tetrate is solving observability problems for enterprises with products that are powered by the open source projects Istio, Envoy, Zipkin, and Apache SkyWalking.
Originally published at https://www.tetrate.io on August 19, 2020.