How to achieve observability? [You need a plan]

Eliot
3 min readJul 24, 2021

--

Observability, monitoring & APM. You are likely to be hearing those words a lot if you have something to do with DevOps or in general how your products and services serve your customers and clients.

No organisations (large or small) would not want to uplift their maturity in this space. Having a plan (knowing it can evolve) is essential to achieve the outcome you desire and it can often be a daunting task. I am going to share one way to approach this in a structured way and one sample plan.

Let’s quick like revisit what it means by observability:

Observability is a measure of how well internal states of a system can be inferred from knowledge of its external outputs. In control theory, the observability and controllability of a linear system are mathematical duals.

So the more observable your systems are, they are more controllable.

Having a plan (knowing it can evolve) is essential to achieve the observability relevant for you and it can often be a daunting task. I am going to share one way to approach this — 5W2H method.

  • Why? Why is it important ? What do you stand to loose if you do nothing?
  • What? What does end to end observability looks like? What layers are involved? What (type of ) data you need to collect? What are the enablers?
  • Who? Who are the stakeholders? Who will be the consumer of your output ? Who will help you producing the output?
  • Where? Where do you collect/store/process/analyze/visualize data (on prem, in the cloud or hybrid)? Is there other systems that can benefit from the data?
  • When? What are phases you can roll this out? How long does each phase likely to take? How can you prioritise all those need to be done? Time to revisit Pareto principle?
  • How? How to collect/store/process/analyze/visualize data? How much of those can be automated? Can you do it at a scale? What are the existing tools? What else do you need? Can you leverage out of box features or it needs to be custom engineered
  • How much? How much is it going to cost (compute/storage/license/engineering/testing)? Are you prepared to shoulder the cost now? If no, then go back to When to see if you can build it incrementally?
  • Bonus question: What else? What else I can do this space to cater for the future that your business is heading (for example predictive analytics)? How does it impact what you are planning?

Now let’s look at one example output using 5W2H method

Context

The fictional company — Arvinton is a medium size company which

  • Runs digital channels on top of systems running in the cloud and on prem.
  • Uses Azure as a cloud provider (it can be any public cloud provider for this exercise).
  • Has an existing APM tool
  • Starts to build out more cloud native systems which rely on SaaS providers and vendor managed systems to function.
  • Cloud cost profile is starting to get under the radar by finance team.
  • The security team is lean and nervous about the increasing cloud profile.

The diagram tries to illustrate 5W2H of the end to end observability from the perspective of 6 layers following a typical lifecycle.

  • Business outcome
  • End user experience
  • Application (and database)
  • Infra/Network
  • Security
  • Financials (FinOps)

The solution needs to integrate with external systems like ITSM, Security Operation Center (Rapid 7), existing APM, and last but not least health APIs of systems it depends on.

I will expand on How and How Much in another post as there is quite a few topics to cover.

--

--

Eliot

A technologist, father of two girls, home gym enthusiast, realistic dreamer.