Artificial intelligence takes APM from good to great

Steve Lamont
6 min readOct 12, 2016

Traditional application performance management (APM) has gotten really good over the past decade. It’s gone from an expensive insurance policy to a business growth driver. We have more granular data on what’s going on inside our applications, down to the code level. Resolving problems that impact customers is 20X faster. More fundamentally, modern APM takes an outside-in, end-to-end approach. It’s now looking at things from the end users’ perspective, deciphering what all that application-performance data means in terms of customer experience — translating IT metrics into business value.

Now APM has gone from good to great.

The term “paradigm shift” has been so overused since it was introduced over 50 years ago that it’s degenerated into marketing-speak for any new technology. You would be rightfully wary of any Next New Thing making that claim. But Dynatrace’s innovative approach of powering its APM solution with an artificial intelligence (AI) engine could well render traditional APM solutions irrelevant — the way word processors trivialized the typewriter. At the very least, AI-powered APM is a game-changer.

AI-driven APM attains the holy grail of any truly breakthrough technological innovation: it simplifies the complex, automates manual tasks, and allows us to make smarter decisions faster.

Artificial intelligence does the heavy lifting

Artificial intelligence has been characterized as being able to perceive its environment and apply cognitive functions such as learning and problem-solving. In the context of APM, Dynatrace applies artificial intelligence algorithms and context-rich diagnostics to

  • auto-discover all components of your full technology stack end to endfrom the customers’ web browsers all the way down to the host infrastructure
  • map out the entire IT environment in an interactive visual display
  • identify the millions of dependencies among websites, applications, services, processes, hosts, networks, and cloud infrastructure
  • learn how it all works together and what constitutes normal behavior
  • automatically detect, analyze, and prioritize anomalies and performance problems
  • actually recommend solutions to the root cause of those problems

And AI does all this at a speed, scale, and precision that no human could ever match.

These days, IT systems are getting so large and so complex that monitoring and managing applications manually in a traditional APM solution has become prohibitively time-consuming. Many IT departments find themselves spending more time troubleshooting existing web applications than developing innovative new ones. An IDG Research Services survey found that 45% of CIOs think that too much time spent on resolving urgent issues limits their ability to launch new services and meet changing business demands.

And no wonder. Gone are the days of the monolithic, straight-ahead application environments. Nowadays, how hosts, processes, services, and applications all work together is a labyrinth of complexity. And it’s this interplay that precipitates all the trouble, and what takes so much time for humans to untangle. The conventional approach of reactively sifting through mounds of data to figure out what’s gone wrong — finding the needle in a haystack — has rapidly become antiquated.

In today’s highly dynamic distributed, virtualized, and cloud landscape — where elastic, self-scaling server environments are changing from moment to moment — no sooner do you have a handle on how everything is interrelated than it all changes on the fly. It’s simply impossible for a human being to analyze the (literally) millions of application dependencies and pinpoint the root cause of a problem. In essence, AI enables “the computer” to do what it does best: absorb huge amounts of information and make sense of it faster and more thoroughly than is humanly possible.

For example, the screen shot above is a Dynatrace visualization of an online retailer’s fully containerized but relatively simple system with only 142 hosts (with each red bubble indicating that there’s some sort of problem). You can literally see the difficulty in managing all the interdependent processes and services as the infrastructure scales up and down.

That’s where artificial intelligence comes in. The Dynatrace AI engine does all this at a speed, scale, and precision that no human could ever match.

The AI advantage: Identifying the solution, not the problem

Often when applications experience degraded performance, the problem is obvious but the solution is not. Any monitoring tool can alert you that there’s a problem after the fact. The good ones can even tell you whether the problem is affecting customers. But a great one lets you know about a performance problem before it becomes a customer problem, identifies the root cause, and tells you how to actually fix it.

Traditional APM solutions are good at gathering data and throwing a bunch of charts at you. It’s left to the “data nerds” to parse through all that data manually to identify the underlying root cause of the problem and come up with a solution. In today’s ultra-complex, highly dynamic environments, that can take hours, even days. But when performance issues are affecting your customers (and therefore your business), you can’t wait for a war room to sift through a mountain of data. You need to fix things — fast.

In contrast, artificial intelligence–powered APM proactively pinpoints the underlying root causes of problems in seconds. It analyzes trillions of events per day, so it can tell you where and why applications are breaking down — and what to do about it.

Dynatrace “knows” your applications. Its AI engine continuously auto-discovers and monitors every aspect of every application. Sophisticated AI algorithms learn normal application performance patterns and proactively flag anomalies. The AI auto-adjusts baselines dynamically in real time to avoid false positives caused by preset static alerting thresholds.

Performance problems are seldom isolated, one-time events, and they’re usually symptoms of a larger issue. Dynatrace looks at all other transactions that used the same components around the same time to see if they also experienced problems. Artificial intelligence correlates events throughout the full technology stack — client side, server side, infrastructure level — to identify analogous issues and detect causal relationships. AI diagnostics analyze the vast amount of data collected to direct you to the exact component that’s causing the problem.

The screen shot below shows how Dynatrace analyzed over 820 billion dependencies in order to automatically figure out that the root cause of a particular problem was a TCP networking issue within the Docker container environment.

We can see that over 3,400 services were affected. But instead of spamming you with thousands of individual alerts that only tell you the symptoms of the problem, Dynatrace AI is smart enough to figure out — automatically — that all of these service issues were due to the same TCP networking problem.

What Dynatrace does in seconds would be virtually impossible in a traditional war room approach.

In today’s increasingly complex and fluid IT environments, interpreting data points with millions of dependencies in order to determine underlying casual relationships is more than humans can master by themselves. Only artificial intelligence can take a colossal volume of data and translate it into actionable solutions before problems hit customers.

See for yourself what Dynatrace can do. Dynatrace offers a free trial — the complete full-blown version, not a stripped-down demo — that auto-discovers and visually maps out your entire technology stack within minutes.

--

--

Steve Lamont

Steve Lamont has been writing about operational and information technologies for over 10 years. He is currently an in-house content editor at NetBrain.