15 Observability-driven Development Experts and Blogs to Follow

Saiona Stoian
ETEAM
Published in
10 min readDec 19, 2023

At ETEAM, our passion for learning about cutting-edge practices and technologies is neverending. We have closely followed the evolution of observability from the early days of monitoring and firmly believe in its integration across all stages of the development lifecycle.

Not only have we extensively written about Observability-Driven Development (ODD), but we have also actively applied it to enhance performance, enable more frequent deployments, and reduce change failure rates by 3x. Our experience has shown that ODD has the potential to significantly reduce incidents and downtime by fostering the development of resilient applications from the beginning, leading to improved business outcomes.

Our mission is to help engineering teams in harnessing the power of this new ideology to their advantage. We’ve curated a collection of top-notch resources from around the web, featuring experts who are reshaping observability and using it to address challenges in innovative ways.

Whether you are new to observability in custom software development or a seasoned professional, exploring these insights from experts and blogs can provide valuable perspectives you can bring back to your own team and company.

Photos of observability-driven development experts to follow.

Observability experts you need to follow

1. Charity Majors

Charity Majors, the Co-founder and CTO of Honeycomb, is a famous figure in the tech industry — author, podcaster, and Ops engineer — dedicated to enhancing development and delivery cycles through observability.

Why follow: As a pioneering voice in modern observability, Charity Majors draws from her extensive career background at Parse and Facebook to articulate and address the challenges inherent in managing and sustaining complex distributed systems at scale. Her status as a published co-author at O’Reilly, with books like “Observability Engineering: Achieving Production Excellence,” underscores her authority in the field.

Through talks, interviews, and blog content, Majors emphasizes the critical role of developers in taking ownership of their code’s fate once it reaches production. Observability, in her perspective, empowers developers to fully embrace the cycle of their code, adopting a proactive stance to tackle issues encountered by intricate applications.

Talks about: observability-driven development, site reliability engineering (SRE), database reliability engineering (DBRE), monitoring complex distributed systems, building high-performing development teams

Channels: Personal blog | LinkedIn | X

2. Cindy Sridharan

Cindy Sridharan, renowned for her book “Distributed Systems Observability,” is an author and expert focusing on strategies for constructing resilient systems and maintainable services, irrespective of their size and load.

Why follow: Cindy Sridharan’s background in infrastructure and API development, coupled with her deep insights into distributed systems, positions her as a noteworthy figure in the observability domain. She actively manages the Prometheus user group and has contributed to the committees of several prominent industry conferences on systems engineering.

Sridharan’s expertise covers a range of topics, from the transformation of monitoring in the era of cloud-native architectures to achieving better visibility into system behavior. For those seeking insights into selecting the optimal observability strategy for their distributed systems, her blog and book serve as valuable starting points.

Talks about: observability for large-scale cloud services, distributed tracing, zero downtime deployments, testing microservices, testing in production, API development

Channels: Personal blog | X

3. Jaana Dogan

Jaana Dogan, currently holding the position of Distinguished Software Engineer at GitHub, has a background that includes roles at AWS Observability and Google, with a specific focus on the observability of Go production services.

Why follow: Leveraging her great experience in building developer platforms and tools, Jaana Dogan offers a comprehensive perspective on making systems observable and performing. Her insights encompass optimization techniques, infrastructure considerations, and the evaluation of monitoring tools.

Dogan’s blog and Medium presence feature a wealth of resources, including step-by-step tutorials and opinion articles. Her coverage spans topics from working with Go to collecting metrics for observability, along with best practices for configuration and release management.

Additionally, she shares valuable insights into fostering effective collaboration between engineering teams and optimizing development and operations processes.

Talks about: developer tools, data observability, system performance, system health and debugging, microservices, Go, engineering culture and practices

Channels: Personal blog | Medium | X

4. Liz Fong-Jones

Liz Fong-Jones, currently serving as Developer Advocate and Field CTO at Honeycomb, brings a wealth of experience to the observability domain, having previously worked on products like the Google Cloud Load Balancer.

Why follow: As an active contributor to the observability community, Liz Fong-Jones engages with the audience through talks, publications, videos, and podcasts, primarily centered around site reliability engineering (SRE) and the management of complex, distributed systems.

For those interested in observability beyond tools and technology, Liz provides insights into how this paradigm is reshaping the landscape of production involvement, engineering collaboration, and success metrics. Notably, she advocates for ethical tech practices and emphasizes the significance of cultivating healthy engineering cultures.

Talks about: observability engineering, troubleshooting and monitoring, SRE, sustainable operations, teamwork, diversity and inclusion in tech, ethical considerations

Channels: Personal blog | LinkedIn

5. Yan Cui

Yan Cui, widely recognized as “The Burning Monk” based on his blog’s name, holds the title of AWS Serverless Hero and works as an independent consultant. His focus in videos, courses, and workshops revolves around the intersection of serverless computing and observability.

Why follow: For those who prefer concise technical content, Yan Cui’s YouTube channel offers a plethora of videos covering monitoring, tracing, and debugging techniques specific to serverless applications. Additionally, he hosts the “Real-world serverless” podcast featuring interviews with observability experts.

As an AWS professional, Cui frequently delivers in-depth tutorials, code samples, and guidance on maximizing the potential of AWS services and tools. His content explores strategies for gaining visibility into serverless systems to enhance understanding and facilitate effective troubleshooting.

Talks about: observability in serverless architectures, building serverless architectures in AWS, serverless case studies, improving system performance, AI in DevOps

Channels: Personal blog | YouTube | LinkedIn

Blogs and articles on observability you should bookmark

Beginner-friendly

Observability revolves around simplifying the intricacies to grasp the behavior of your systems, services, and applications and the reasons behind their actions. Nevertheless, the procedure may appear complicated, especially for those new to the field.

These materials simplify the jargon to provide clear explanations of related concepts.

6. o11y.wiki

O11y serves as the shorthand for observability and is also the title of this wiki, encompassing essential terms in the observability domain, spanning from A to Z. This comprehensive glossary encompasses all the necessary definitions important for those embarking on this journey.

It covers fundamental concepts such as alerts and logs to more specialized use cases like tail sampling.

7. ETEAM Blog

Cover image of ETEAM article on what is observability in custom software development.

At ETEAM, our enthusiasm for learning is completed by the commitment to knowledge-sharing. To contribute to this effort, we started a series of easily understandable articles tailored for newcomers.

The series begins with broad concepts, explaining what observability is and drawing comparisons with Application Performance Monitoring (APM). As the series unfolds, it delves into more specialized topics, including the crossing of observability-driven development and security.

Recommended articles:

Observability best practices

As engineers, our responsibility lies in ensuring that custom software development aligns with both business and user requirements, all while maintaining optimal performance. This is where the implementation of best practices becomes necessary.

The blogs presented here cater to a diverse range of skill levels, accommodating even experienced DevOps professionals. They contain tried-and-tested methods, advanced strategies, and in-depth analyses of observability to support the continuous enhancement of software development practices.

8. Honeycomb Blog

Honeycomb’s blog stands out as a respected voice in the industry, offering a comprehensive exploration of best practices. Topics covered include strategies for avoiding fatigue through proper alert configuration and the instrumentation of code to generate meaningful telemetry data.

In the “Ask Miss O11y” series, readers have the opportunity to submit questions about observability best practices and the challenges they encounter during implementation.

Recommended articles:

9. New Relic Blog

Cover image of New Relic article introducing expert observability series.

New Relic’s blog, centered on best practices for troubleshooting and enhancing software performance, serves as an excellent resource for professionals in the field.

A noteworthy addition is the recently launched “The Expert Observability series,” including articles where New Relic engineers share insights, tips, and real-world scenarios detailing how they refined their techniques.

Recommended articles:

Observability tools and technologies

Implementing observability involves not just selecting the appropriate tools but also configuring them effectively to seamlessly integrate with your tech stack.

Many monitoring platforms offer tutorials that guide you through the process of implementing observability using their solutions or complementary tools.

10. Dynatrace Blog

Beyond articles that delve into leveraging Dynatrace for observability, the content also provides valuable insights into the broader toolkit ecosystem.

Topics range from distinguishing between observability platforms and tools to gaining a comprehensive understanding of AIOps tools and the advancements in AI-driven observability — a rising trend in the industry.

Recommended articles:

11. ITNEXT on Medium

ITNEXT, a Medium publication, is dedicated to blog articles covering next-gen technologies. While its focus is not exclusively on observability, it does offer valuable guides and tutorials on working with tools and standards like OpenTelemetry, Prometheus, and Service-Level Objective (SLO) generators such as Sloth.

Recommended articles:

Industry-specific insights

Every industry presents its unique challenges, requiring distinct approaches in terms of what to monitor and optimize for.

For instance, the financial services sector may prioritize low latency, while healthcare places priority on data accuracy.

Countless valuable resources are available to provide an in-depth exploration of industry-specific use cases in observability.

12. Splunk Blog

In addition to covering related topics, Splunk’s blog extends its content to include discussions on observability in various sectors, encompassing finance, healthcare, the public sector, and more.

Their podcast, The Security Detail, further expands the conversation by exploring how improved system visibility can enhance understanding and defense against threats across different verticals.

Recommended articles:

13. Cisco AppDynamics Blog

Cover image of Cisco AppDynamics Blog article on what is digital experience monitoring.

If you’re curious about how observability practices are applied in different domains, the AppDynamics blog explores industry use cases in higher education and public services. Also, it delves into how observability can enhance the overall user experience.

The emerging field of Digital Experience Monitoring is also covered, focusing on analyzing the quality of user interactions to optimize the end-to-end experience.

Recommended articles:

Emerging trends and the future of observability

The field of observability has experienced exponential growth in recent years, and further changes are expected.

Industry experts contribute insightful and sometimes controversial predictions on the future evolution of observability and related technologies, exploring their potential impacts on businesses.

Here are two such examples.

14. APM Digest

APM Digest covers an extensive array of topics on the future of application performance management.

In addition to publishing articles on observability practices and emerging trends, they gather insights from analysts, consultants, and vendors to compile an annual list of predictions encompassing IT performance and observability topics.

Recommended articles:

15. Grafana Labs Blog

Cover image of Grafana Labs Blog article on Observability Survey 2023.

Over the last two years, Grafana Labs has conducted surveys on the state of observability, sharing the results on their blog.

Drawing insights from feedback provided by hundreds of industry practitioners, the report showcases trends in tools and data sources, market maturity, and future priorities within the observability field.

Recommended articles:

Conclusion

At ETEAM, we strive to stay up-to-date with the latest developments in application performance monitoring and observability. We’ve curated this list to offer a comprehensive overview, encompassing beginner-friendly articles, best practices, and future trends, along with insights from some of the most active voices in the industry.

Given the complexity of systems spanning multi-cloud environments and services, incident response becomes more challenging. Yet, it highlights the critical importance of achieving full-stack visibility.

The ability to swiftly troubleshoot and pinpoint the cause of an incident is invaluable for ensuring smooth business operations, making observability a top priority.

Ready to dive deeper into the world of software development and stay up-to-date with the latest tech news? Visit our blog and unlock valuable insights, expert tips, and industry trends. Click here to explore our blog now!

--

--

Saiona Stoian
ETEAM
Writer for

Content Writer in the tech industry and a work in development (pun intended). Forever learning and growing as a human and storyteller.