Stoopid Network Management Products and other horror stories

Ronald Bartels
Dec 9, 2018 · 3 min read

Doing things in a manner that has never been optimal

In a previous article I have written about where operations management tools come up short. The fundamental analytical mistake they make is that there needs to be an evaluation of positive as well as negative inputs. What is meant by this is we need to understand normal operations, the positive, as well as abnormal operations, the negative. A metric is pretty much meaningless without context.

Many moons ago, TCP/IP was developed. In 1988, the year I graduated from Maties and more than a decade after TCP/IP, Simple Network Management Protocol (SNMP) was developed. Ever since, it has been the primary method by which networking infrastructure has been mined for metrics and it is the basis on which a multitude of Stupid Network Management Products have been flogged to the ever suffering network community. Some of these have been been unashamedly expensive, with some of those being nothing more than RAGs

Fundamentally a network error, outage or failure is never an event. A tool that is an event logging or “event management” system can also be added to our growing list of horror stories. The reason is that we need to view errors, outages and failures, not as singular events but processes consisting of multiple correlations that form a life cycle. The source of investigative knowledge with which we manage this life cycle consists of more than just network metrics or logged events. There are also attributes of people, processes as well as technology. The latter is typically the only one that is ever instrumented.

Thus let us start at the beginning and analyze how we deal with network incidents. We need to learn how pilots do it by using checklists. Although Charles Lindbergh, who flew non-stop across the Atlantic in the Spirit of St Louis, did not use checklists, they came into extensive use by pilots during the time of WWII. Pilots have been doing what the network community should be doing and they have been doing it for the better part of a century. Nearly a decade ago, I developed a networking troubleshooting checklist. It takes a person through a number of checks to valid network operations, just like a pilot would do with his plane. The pilot would execute the checklist using the visual validation of his cockpit dashboard. In flying a network we would do the same but with a network dashboard. Unluckily, due to the functional design of most network dashboard instrumentation, these is some crucial instrumentation missing as previously mentioned.The methodologies encouraged by the use of checklists dramatically reduces the time to troubleshoot a network as well as direct attention to the required metrics.

Even after ten years, there still isn’t a Stupid Network Management product that can automatically provide instrumentation for each of the checks in my checklist in a seamless fashion. The aviation industry is a benchmark, and network operation centres (NOCs) the world over should be assimilating how we control the sky. In the NOC, we can start with simple aggregated views!

The Other Guy

Clan meerkat of the technology burrow (braaivleis, rugby, sunny skies and computers)

Ronald Bartels

Written by

Ronald is a technologist and service management evangelist. He started driving a tractor when he was five years old and would love to own a Massey-Fergurson!

The Other Guy

Clan meerkat of the technology burrow (braaivleis, rugby, sunny skies and computers)

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade