Python and Ansible to Automate a Network Security Workflow

Published in

The Startup

11 min readJul 9, 2020

Often times, when discussing automation strategies, the topic of Python vs Ansible comes up. However, automation goes beyond a script, task, tool or platform. Don’t fall into the trap of looking at this as a binary solution!.

I see Ansible as a framework to wire things together, to reduce the overhead of mundane tasks. Interconnecting different building blocks to orchestrate an end-to-end automation workflow, which in most cases involves different tools, platforms or resources in a diverse ecosystem across multiple domains in your organization.

If you are comfortable running Python to automate parts of your infrastructure, but find yourself in a team of one, automating a single domain, it’s probably time to think on how to make your home-grown tools more accessible to others, in other words how to increase the adoption of your work, which will hopefully become a component of a larger project that automates your end-to-end infrastructure deployment, application lifecycle or service delivery.

One way to achieve this is by making your Python program easier to use to lower the barrier of entry. Do you provide a GUI? a CLI? an API? Does the user need to know programming? Here is where Ansible comes in! If you made your Python program an Ansible Module, you would then let users call it by using a simple, and easy to read language that enable automation content sharing across an organization.

In this post we are going to describe how you can make your favorite Python library an Ansible module and how that becomes a key component to solve a business need.

The use-case

Let’s imagine a company that is concerned about people outside their premises accessing their internal infrastructure, from servers to networking equipment. They are very paranoid, so they not only set up filters on their Firewalls to restrict external access, but also apply Access Control Lists (ACL’s) on every management interface to prevent SSH access to anyone not in their company allocated IP address space. These configurations are fairly static, so no automation required!

However, in covid-19 times, the scenario has changed. Their now remote users need access to their internal infrastructure. They don’t have a VPN setup, so they decide to allow access to remote IP addresses of their authenticated admin users. However, these IP addresses can change over time. Also, there is no margin for error when you modify a management ACL. A good percentage of Network Engineers have locked themselves out of a device when performing this type of activity (TRUE FACT!).

Given that we need to make sensitive config changes on multiple devices, which include servers, network devices from different vendors, and firewalls, each with their own configuration syntax, this activity could take hours. It will also need to be executed an undetermined number of times to add and remove entries.

It becomes apparent this process needs to be automated. An engineer suggested they use this cool Python library to translate a generic ACL policy into the specific configuration syntax of each platform, named Capirca.

Alright, one less problem to solve! They now need to figure out:

How to test these changes in a lab/development environment
Generate configuration backups before making any changes to production devices
Connect to the devices and apply the configuration changes
Have a rollback strategy ready to go in case something fails
Commit the changes and notify users and other systems

Because Ansible can help us with all these, we will make Capirca an Ansible module to run this process as a fully automated workflow.

Creating a Module distributed as part of a Collection

According to Ansible module development, an Ansible Module is a reusable, standalone script that Ansible runs on your behalf, either locally or remotely. Modules interact with your local machine, an API, or a remote system to perform specific tasks like changing a database password or spinning up a cloud instance. Each module can be used by the Ansible API, or by the ansible or ansible-playbook programs. A module provides a defined interface, accepting arguments and returning information to Ansible by printing a JSON string to stdout before exiting.

We store this reusable, standalone script that Ansible runs on your behalf in the “magic” directory (plugins/modules), as described in Adding modules and plugins locally. But, where does this directory live? It actually depends on how you distribute this content. There are different options as listed in Adding a module locally, but in this case we are going to take advantage of Collections, a new standard of distributing, maintaining and consuming automation as described in Getting Started With Ansible Content Collections. In order to initialize a collection, we run ansible-galaxy collection init <namespace.name> to create a skeleton directory structure.

⇨  ansible-galaxy collection init nleiva.capirca_acl
- Collection nleiva.capirca_acl was created successfully

Resulting in:

⇨  tree nleiva/capirca_acl
nleiva/capirca_acl
├── docs
├── galaxy.yml
├── plugins
│   └── README.md
├── README.md
└── roles

In the plugins folder, we create a new one named modules. We put the code of our first module, named translate in there.

⇨  tree nleiva/capirca_acl
...
├── plugins
│   └── modules
│       └── translate.py

The initial code comes straight from Starting a new module. Remember: “Good developers copy; great developers paste”

We are ready to implement the logic of our module, which for simplicity will only parse the inputs and pass that data to Capirca to let it do its magic.

Writing the Module

In their Wiki page, they describe Capirca as a tool designed to utilize common definitions of networks, services and high-level policy files to facilitate the development and manipulation of network access control lists (ACLs) for various platforms.

The name, “capirca”, was intended to be “caprica” from BattleStar galactica (the “new world”). They registered the misspelling, then later noticed the error, but the correct spelling was already taken [SOURCE].

Capirca requires a naming definition for network or services, and a policy file to generate ACL filters for different target platforms. These are inputs we need to provide to the library. We will instruct our Ansible module to take these as arguments, so we edit module_args in the initial code we copied to looks like this:

Where:

platform: Has to be one of the platforms from the extensive list of options Capirca supports, otherwise the module will complain. See platform.
filter_options: Additional arguments, which are optional in many cases. For example, the ACL filter name. Full details in filter_options.
comment: An optional comment.
def_folder: Folder where the naming definitions are stored. Each file must end in either a ‘.net’ or ‘.svc’ extension, specifying a network or services definitions files. We’ll discuss an example in a few paragraphs.
pol_file: Policy file that describe the Filter terms. We’ll discuss an example in a few paragraphs.

Next, we pass the value for these arguments, which we specify in the Playbook, to Ansible. No change is required to the initial code. A dictionary result is also defined in the original code, which we will leave as is for simplicity, and leverage one of its elements (message) later on.

Now that we have the user inputs collected during execution, we can call our Python library. In this example, we defined a separate function (get_acl) to handle this portion. Notice we pass the parsed module arguments to this function, which at this point you have access to through module.params.

An oversimplified version of this function (get_acl) would look like this:

In a nutshell, we will generate a templated header for the policy, read the input files with the naming definitions in the folder (defs), and the policy terms (terms). Then call Capirca to generate a filter ACL for a given platform (juniper and cisco included as examples in this code snippet).

Finally, we store the value returned by this function on the result dictionary, message field, and pass it to AnsibleModule.exit_json().

That’s it! If you want to see the full code, take a look at capirca_acl. Some things might change as we iterate over this module.

Using the Module

Let’s go through a simple example to update a SSH Filter (there is another example documented in translate docs). Our Playbook consists of a single Play with two Tasks. The first one run the translate module we just described, and then we use the debug module to print out the result.

In this case we choose Cisco IOS XR (ciscoxr) as the target platform to generate an ACL filter for this specific Operating System. We store our naming definitions in a folder (def_folder) named sample. There are two files in there, one that defines our networks (networks.net):

And another one that defines our services (services.svc):

These files are fairly static, you would normally re-use them every time and update when required. We will most likely make these YAML compatible in a future release of the module.

However, the most important part is the actual policy (pol_file) we want to generate. These are the terms, which we describe in a generic language, to be translated into the syntax of the target platform. In this example we use:

The resulting ACL filter for Cisco IOS XR (ciscoxr) will be:

ipv4 access-list My-SSH-Filterremark allow-intra-office
 remark Spoofs are already blocked at the borders
 permit tcp 10.0.0.0 0.255.255.255 10.0.0.0 0.255.255.255 eq 22
 permit tcp 10.0.0.0 0.255.255.255 172.16.0.0 0.15.255.255 eq 22
 permit tcp 10.0.0.0 0.255.255.255 192.0.2.0 0.0.0.255 eq 22
 permit tcp 10.0.0.0 0.255.255.255 192.168.0.0 0.0.255.255 eq 22
 permit tcp 10.0.0.0 0.255.255.255 host 198.51.100.0 eq 22
 permit tcp 172.16.0.0 0.15.255.255 10.0.0.0 0.255.255.255 eq 22
 permit tcp 172.16.0.0 0.15.255.255 172.16.0.0 0.15.255.255 eq 22
 permit tcp 172.16.0.0 0.15.255.255 192.0.2.0 0.0.0.255 eq 22
 permit tcp 172.16.0.0 0.15.255.255 192.168.0.0 0.0.255.255 eq 22
 permit tcp 172.16.0.0 0.15.255.255 host 198.51.100.0 eq 22
 permit tcp 192.0.2.0 0.0.0.255 10.0.0.0 0.255.255.255 eq 22
 permit tcp 192.0.2.0 0.0.0.255 172.16.0.0 0.15.255.255 eq 22
 permit tcp 192.0.2.0 0.0.0.255 192.0.2.0 0.0.0.255 eq 22
 permit tcp 192.0.2.0 0.0.0.255 192.168.0.0 0.0.255.255 eq 22
 permit tcp 192.0.2.0 0.0.0.255 host 198.51.100.0 eq 22
 permit tcp 192.168.0.0 0.0.255.255 10.0.0.0 0.255.255.255 eq 22
 permit tcp 192.168.0.0 0.0.255.255 172.16.0.0 0.15.255.255 eq 22
 permit tcp 192.168.0.0 0.0.255.255 192.0.2.0 0.0.0.255 eq 22
 permit tcp 192.168.0.0 0.0.255.255 192.168.0.0 0.0.255.255 eq 22
 permit tcp 192.168.0.0 0.0.255.255 host 198.51.100.0 eq 22
 permit tcp host 198.51.100.0 10.0.0.0 0.255.255.255 eq 22
 permit tcp host 198.51.100.0 172.16.0.0 0.15.255.255 eq 22
 permit tcp host 198.51.100.0 192.0.2.0 0.0.0.255 eq 22
 permit tcp host 198.51.100.0 192.168.0.0 0.0.255.255 eq 22
 permit tcp host 198.51.100.0 host 198.51.100.0 eq 22remark allow-remote
 permit tcp host 203.0.113.100 any eq 22
 permit tcp host 203.0.113.200 any eq 22remark implicit-default-deny
 remark Deny everything else
 deny ipv4 any any
exit

However, the beauty of Capirca is that if we only changed the target platform to juniper in this example, we would then get this result instead:

firewall {
    family inet {
        replace:
        filter My-SSH-Filter {
            interface-specific;
            term allow-intra-office {
                from {
                    source-address {
                        10.0.0.0/8;
                        172.16.0.0/12;
                        192.0.2.0/24;
                        192.168.0.0/16;
                        198.51.100.0/32;
                    }
                    destination-address {
                        10.0.0.0/8;
                        172.16.0.0/12;
                        192.0.2.0/24;
                        192.168.0.0/16;
                        198.51.100.0/32;
                    }
                    protocol tcp;
                    destination-port 22;
                }
                then accept;
            }
            term allow-remote {
                from {
                    source-address {
                        203.0.113.100/32;
                        203.0.113.200/32;
                    }
                    protocol tcp;
                    destination-port 22;
                }
                then accept;
            }
            term implicit-default-deny {
                then {
                    discard;
                }
            }
        }
    }
}

Or maybe if we chose iptables with something like:

To get:

-P INPUT ACCEPT
-N I_allow-intra-office
-A INPUT -j I_allow-intra-office
-A I_allow-intra-office -m comment --comment "Spoofs are already blocked at the borders"
-A I_allow-intra-office -p tcp --dport 22 -s 10.0.0.0/8 -d 10.0.0.0/8 -m state --state NEW,ESTABLISHED,RELATED -j ACCEPT ...-A I_allow-intra-office -p tcp --dport 22 -s 198.51.100.0/32 -d 10.0.0.0/8 -m state --state NEW,ESTABLISHED,RELATED -j ACCEPT
-A I_allow-intra-office -p tcp --dport 22 -s 198.51.100.0/32 -d 172.16.0.0/12 -m state --state NEW,ESTABLISHED,RELATED -j ACCEPT
-A I_allow-intra-office -p tcp --dport 22 -s 198.51.100.0/32 -d 192.0.2.0/24 -m state --state NEW,ESTABLISHED,RELATED -j ACCEPT
-A I_allow-intra-office -p tcp --dport 22 -s 198.51.100.0/32 -d 192.168.0.0/16 -m state --state NEW,ESTABLISHED,RELATED -j ACCEPT
-A I_allow-intra-office -p tcp --dport 22 -s 198.51.100.0/32 -d 198.51.100.0/32 -m state --state NEW,ESTABLISHED,RELATED -j ACCEPT
-N I_allow-remote
-A INPUT -j I_allow-remote
-A I_allow-remote -p tcp --dport 22 -s 203.0.113.100/32 -m state --state NEW,ESTABLISHED,RELATED -j ACCEPT
-A I_allow-remote -p tcp --dport 22 -s 203.0.113.200/32 -m state --state NEW,ESTABLISHED,RELATED -j ACCEPT
-N I_implicit-default-deny
-A INPUT -j I_implicit-default-deny
-A I_implicit-default-deny -m comment --comment "Deny everything else"
-A I_implicit-default-deny -p all -j DROP

How cool is that!?

Final workflow

Let’s now put all the pieces together. We will run this against two separate group of hosts, as defined in our inventory; Lab and Prod. The first one is small number of devices that represent one of every platform running in production, used for testing purposes in the lab, so changes can be exercised in a safe environment before jumping into production.

The fist step will be to Generate an ACL Filter as we run through the different target hosts with the module we just built (Generate ACL Filters). We dynamically determine the target platform by mapping the data Ansible gets from the systems when gather_facts is enabled. This mapping looks something like this:

Now, before we apply the ACL Filter produced, we backup the current configuration and also prepare a last resort plan in case something goes wrong when the new config is applied, which might prevent us from accessing the device to revert these changes (Backup and Rollback scheduling). If the target was a JunOS device, then this would be as simple as setting the confirm argument in the junos_config module when making changes to configure a time out value in minutes for the commit to be confirmed before it is automatically rolled back. On a Cisco IOS device I can only think of a very disruptive reload in without saving the running to start-up config.

If we are modifying iptables on a Linux machine, maybe we could leverage the module at to revert back to the initial config.

With these safeguards in place, we can proceed to apply the configuration changes (Apply Filters). Our workflow so far looks like this:

In case we can’t apply the changes, we cancel any scheduled rollback. Otherwise we continue with the execution of our workflow. Before we can call it a day, we need to run some tests to make sure everything is running as expected.

There are different strategies to verify things look good. The more you test, the better. In this case, we will check that a set of test hosts can still access the platforms we are configuring. We leverage the delegate_to module tell Ansible to execute this task from a different host (test_host) that will validate it can still set up a SSH connection to the device whose configuration we just changed. ansible_host is the ‘current’ host being iterated over in the play, see Special Variables.

If the test pass, then we are in good shape to save our changes (Commit Changes), notify users via email, chat or any other mechanism, as well as updating any system like your CMDB (Notify users/systems). Let’s not forget of cancelling any planned rollback (Cancel Rollback). Of course, if the tests didn’t pass, we want to return to the previous state as soon as possible to minimize any negative impact changes might have caused (Execute Rollback).

Conclusion

Automation is not a tool nor a feature you add to your project. It’s an ongoing practice that requires discipline to glue all the pieces of the puzzle together. Along the same lines, is not about Python or Ansible, they are actually better together.

We didn’t dive into how you upload your Collection/Module to Ansible Galaxy to share your work with the community, do versioning, or test your modules with Travis CI or Github Actions. This will be part of a separate blog post, stay tuned! If you can’t wait, check out capirca_acl repo to get an idea.