IBM’s Observability and AI Operations Solutions

How They Fit Together to Resolve Incidents

Robert Barron
IBM Cloud
4 min readJun 21, 2021

--

IBM has three synergistic solutions in the AIOps domain: IBM Observability by Instana APM, Turbonomic Application Resource Management for IBM Cloud Paks and IBM Cloud Pak® for Watson AIOps.

IBM’s Observability and AI Operations Solutions Working Together

While this article will describe some of their capabilities in incident management and incident resolution (i.e., detecting issues as early as possible and resolving them as quickly and as efficiently as possible), it’s important to remember that each of these three solutions have additional strengths and capabilities.

For example, Instana can perform deep dives into application performance during the development cycle, Turbonomic can deliver application-driven cloud optimization and IBM Cloud Pak for Watson AIOps can perform risk analysis for upcoming changes.

But for the purposes of this article, we will concentrate on the incident management toolchain — an abstraction of the way incidents are handled in real time. Observability tools monitor technologies and services and collect data. The data is then analyzed using advanced machine learning (AIOps) models and an incident is created. The relevant people are notified and runbooks are executed to remediate the issue while a ticket documents the issue. Dashboards and collaboration tools are used to troubleshoot the problem until it is solved.

The toolchain described here has elements of both the AI Ladder (Collect, Organise, Analyse and Infuse) and the IBM Garage Method Incident Management Reference Architecture:

IBM Observability by Instana APM

Here, IBM’s observability solution begins with IBM Observability by Instana APM, which implements collection and monitoring of the entire observability context — from topology and tracing, through monitoring and logging. Instana excels at monitoring cloud-native environments and, together with IBM’s strong legacy of traditional monitoring solutions, can monitor any and all technology platforms.

Instana also has a strong domain-specific artificial intelligence (AI) engine that enables it to organise and analyze the information it has collected to strengthen its event management and help resolve issues faster:

Turbonomic Application Resource Management

Turbonomic Application Resource Management for IBM Cloud Paks is a domain-specific AIOps solution that is dedicated to application resource management. It will modify the environment automatically to maximize the performance of the services and application while minimizing cost and resource usage. Turbonomic does not collect the data itself, but gets it from observability solutions like Instana (though Turbonomic is not limited to Instana and can gather the information it needs from many solutions). Turbonomic has a vast array of tested and proven runbooks that it uses to resolve performance issues automatically:

IBM Cloud Pak for Watson AIOps

IBM Cloud Pak for Watson AIOps is a domain-agnostic AIOps solution. Similar to Turbonomic, it does not collect the data itself; it receives it from other solutions. It collects both structured and unstructured data from hundreds of different monitoring and collection solutions.

Not only does IBM Cloud Pak for Watson AIOps collect the monitoring and observability data, it also leverages past ticketing and runbook information to improve its recommendations. As described in my previous article, IBM Cloud Pak for Watson AIOps works throughout the incident management toolchain — not just by analysing information and infusing insights into the tools, but by acting on these insights by triggering runbooks automatically or recommending manual runbooks to humans. It notifies the right people at the right time and, by using ChatOps, acts as a “digital SRE” working together with humans.

As an added benefit, IBM Cloud Pak for Watson AIOps can also create predictive alerts well before any obvious signals can be seen:

Three solutions working together toward incident management

As you have seen, each of these solutions “plugs-in” to a different scope of the incident management toolchain and each gives a unique benefit. They can function independently, integrated with third-party tools and — best of all — working together.

You can learn more about modern cloud service management and operations in general and AI Operations in particular with IBM’s two field guides: CSMO and AIOps.

Originally published at https://www.ibm.com on June 21, 2021.

--

--

Robert Barron
IBM Cloud

Lessons from the Lunar Landing, Shuttle to SRE | AIOps, ChatOps, DevOps and other Ops | IBMer, opinions are my own