How To Become a DevOps Engineer In Six Months or Less, Part 6: Observe

Igor Kantor
3 min readOct 21, 2022

--

Observability engineer turning on the Elastic stack.

Quick Recap

This is the last entry in our journey: Observe.

For the previous entry, please see Run

Last entry: observability

Believe it or not, the field of “Observability” has evolved into its own discipline, oftentimes staffed by a team separate from the main “Cloud Engineering / DevOps Enablement” team.

And the reason is, this is a complex, multidimensional area, requiring in-depth expertise in operating NoSQL databases, Application Performance Monitoring (APM), code instrumentation, OpenTelemetry, various vendor products, etc.

So, what is Observability?

It is typically depicted as an umbrella term for 3 separate pillars:

https://www.dynatrace.com/news/blog/how-to-get-the-answers-you-deserve-using-the-three-pillars-of-observability/

In essence, Observability fuses together the metrics collected (or published) by your applications, with traces emitted by your distributed microservices, with logs generated by everything in your stack, into a single, coherent view.

For metrics, a good place to start is Google’s Four Golden Signals, plus whatever business metrics make sense for your app (number of users accessing the app, number of items abandoned in shopping carts, etc.)

For traces, these are typically enabled by the auto-instrumentation agent (for JVM-based languages) or with explicit annotations for everything else. They can be extremely helpful when it comes to troubleshooting distributed systems.

For logs, just make sure you follow a standard logging template — Elastic’s ECS Schema is a good place to start.

In theory, all this should allow your application developers, SRE teams, DevOps engineers(!), and support teams to quickly localize and identify the issue.

So, why do I say in theory?

Because in practice, this is quite a difficult feat to pull off.

Ever tried to untangle one of these?

One immediate complication is that a modern observability platform is a complex system of many different parts.

You need to instrument your app to emit the proper metrics and traces. You need to ensure your logs are in a structured, consistent format. You need to add agents everywhere to collect, enrich and ship all this data to a centralized location. If you are self-hosting the solution, you need to take care of the databases that house all this data.

In short, observability projects are never quick, always complex, and the ROI is oftentimes not easy to articulate.

Vendors

As you can imagine, whenever there is something new and complicated, there are usually a plethora of vendor offerings, trying to simplify this entire process.

If you are trying to learn all this from home, Grafana Labs is a great place to start. The company has a stellar reputation in the industry, has a very large install-base in Kubernetes land and is generally looked at as a solid, not terribly expensive Observability option.

There are not many options in the “open source learning space” but Elastic is also a good choice. The ELK Stack has been around for a very long time and is a solid foundation to build upon.

Beyond these two, you get into pricey vendor choices. The ones I’ve enjoyed working with are Honeycomb, Datadog, and LightStep.

But note, there are many, MANY others, way too numerous to mention. If you are just starting out, Grafana Labs is an excellent vehicle to learn this pillar.

Summary

This concludes our series.

Whew… This was a long journey!

As you navigate each pillar, remember the path of learning is not straightforward. You may learn a thing, move onto the next thing and promptly forget the first thing.

That’s OK!

There’s a lot to learn here. You are building a solid foundation for a long-term career and that takes time.

Good luck!

--

--