Navigating turbulence: Implementing observability for Qantas Loyalty

Like many others, our teams had to become more streamlined and focused in the wake of COVID, and so we started exploring tools that could give us an advantage in our IT operations. This is the story of our journey towards implementing observability for Qantas Loyalty, where we carefully studied and defined our approach before ultimately adopting it.

Problem

As our teams became leaner, we needed effective tools to aid team members in their work. Our logging as well infrastructure metrics were in good shape. However, it became clear that the collection and analysis of application metrics and tracing required improvement. Some teams shared tools, while others had their distinct tools, which resulted in a fragmented approach. To reduce costs and, more importantly, facilitate information sharing between teams and expedite problem-solving, there was a need to unify the tools. Streamlining the tools would enable the team to zero in on issues more quickly and efficiently.

Working Group

Approach

Our team’s top priorities were to share operational information, reduce costs, and future-proof our choice to avoid facing similar challenges in the coming years. With these objectives in mind, we began our journey. Initially, a small group of individuals informally compared APM offerings against our requirements. As we progressed, we formed a working group consisting of representatives from each team and formalised our approach.

To begin with, we created two lists — must-have and nice-to-have — to define our priorities. The must-have list comprised features that were critical for us. On the other hand, the nice-to-have list contained features that would be beneficial but were not essential, and we could do without them if necessary. This approach enabled us to focus on the most critical features while keeping in mind the additional features that we could consider in the future.

With the specification lists in place, we began evaluating various APM solutions to determine which ones met our requirements. We discovered that traditional APM tools were not suitable for our needs. Although these tools were useful for monitoring infrastructure and application metrics, they lacked the reliable tracing capability that we needed. As a result, we decided to explore alternative solutions.

Light bulb moment

We eventually settled on using OpenTelemetry, a modern open-source observability framework that enables instrumenting, collecting, and exporting telemetry data from applications, systems, and infrastructure. We chose to focus on providers that support OpenTelemetry as first-class citizens, which allowed us to address our future-proofing requirements. This approach ensured that if we decided to change a provider, we could do so with minimal effort, as all our systems were developed against open standards rather than a specific provider.

Moreover, this effectively unified the mechanism through which operational data was gathered and shared, which eliminated the need for a unified tool and gave individual teams the freedom to pick the providers they felt most comfortable with. However, for cost-saving concerns, we decided to stick to one provider, as this helped in cost negotiation while still providing us with essential observability capabilities.

Takeaways

  1. Get as many teams involved, especially early on when you create your specs. At this stage quantity of ideas is more valuable than the quality.
  2. Define a clear list of must-have and nice-to-have feature lists, and diligently measure success against them.
  3. Trust that others are doing their job as effectively as you are doing yours, the technology leadership trusted us to come up with good recommendations and we trusted that they would take our recommendations seriously.
  4. Don’t hesitate to include junior team members in the evaluation and decision-making process. While they may lack experience, they can make up for it with effort and provide a fresh perspective. Encouraging them to share their thoughts and insights can lead to a well-rounded evaluation. Moreover, this offers an opportunity for them to gain experience and confidence.

Information has been prepared for information purposes only and does not constitute advice.

--

--