Closing the Loop on Testing Network Changes
“..the best way to guard against error is to design systems with layered and overlapping defenses…like slices of Swiss cheese being layered on top of one another until there were no holes you could see through” — from The Premonition, Michael Lewis
This story is written by Dinesh Dutt and Ratul Mahajan
Network changes, such as adding a rack, adding a VLAN or a BGP peer or upgrading the OS, can easily cause an outage and materially impact your business. Rigorous testing is key to minimizing the chances of change-induced outages. A central tenet of such testing is test automation — a program should do the testing, not (error-prone) humans. Test automation should target all stages of the change process. Prior to deployment, it should test that the change is correct and that the network is ready for it. Post deployment, it should test that change was correctly deployed and had the intended impact. This “closed-loop test automation” makes the change process highly resilient and catches problems as early as possible.
But writing the code to automate network testing can be quite complicated. For instance, if you were adding a new leaf, prior to deploying your change, you may want to test that the IP addresses on the new leaf do not overlap with existing ones. So, you may write a script that mines addresses from configurations and then checks for uniqueness. Similarly, post deployment, you may want to test that all spines have the new prefix. So, you may write a script to fetch and process “show” data from all spines. Writing such scripts is fairly complex because you have to know the right commands, know how to extract the data (via textfsm or json queries), and so on. Sometimes, you have to combine information from multiple commands. And of course, this is different for every vendor, and in many cases, across different versions of the vendor. Your ability to automate network testing increases dramatically if you have tools that can take out much of the complexity. In this article, we cover two such open-source tools, Batfish and Suzieq, that help you easily automate closed-loop testing.
The three stages of closed-loop test automation
Closed-loop network testing has three stages:
- Pre-approval testing: Before you schedule the planned change for a maintenance window, test that it is correct.
- Deployment pre-testing: Before you deploy the…