Ansible v.s. Salt (SaltStack) v.s. StackStorm

Disclaimer

Anthony Shaw
19 min readMay 17, 2017

Over the past month I’ve listened to interviews with developers on all 3 products and heard the statement “think of [Ansible/Salt/StackStorm] as the glue”. Now, I’m a hobby DIY’er and I’ll safely share that I don’t have 1 pot of glue in my garage. I’ve got 6 different types depending on the job, the material and the environment. These 3 products are in the same camp, they can each be used to great success to achieve very different things, recently a big overlap has been that they are reaching into the Network Automation space. The opinions below are mine alone, not my employer’s (who sell $bn’s of networking infrastructure and deployment).

I have used all 3 products, heavily contributed to 2 (Salt and StackStorm) and been involved in the contributions to Ansible. Speaking frankly, the product I am least familiar with is Ansible, but I’ve spoken and gathered information from colleagues to fill gaps where appropriate.

If you’re going to skip to the end and see which I declared the winner- you will be disappointed. Consider your requirements and try more than 1 of these products.

Adult supervision advised

Some questions to ask yourself:

  • Which environments do I need to support? What is my blend of servers and network devices?
  • Who are my users? Is this for hardcore sysadmin types, the infosec team, developers?
  • How much custom development am I willing to do?

Agent v.s. Agentless

This is really important, so take note. In this article I’m going to be focusing on device automation and orchestration. Those devices might be routers, switches, firewalls, next-generation-wave-emitting-circulatrons, it doesn’t matter. What does matter is they won’t have an agent installed in the operating system. Ansible had the legacy of using SSH as the transport, so fits well in endpoint-configuration world where SSH is the lowest-common-denominator. SaltStack was born as a bus for high-speed and secure minion (agent) to master communication it also has agent-less mode. StackStorm entered the field last and by design stays away from either option by supporting both agent-based tools via packs for Chef, Puppet, Salt and also it’s own SSH-based remote controls and native support for calling Ansible playbooks.

APIs

Another key difference is the APIs,

  • Ansible is available as a CLI, Ansible Tower (Enterprise Version) has an API,
  • StackStorm has a CLI, but also has REST APIs for all of the components and services in the free version,
  • Salt has a CLI, as well as a REST API (not enabled by default), you can also use the “Enterprise API” as part of the Enterprise product which includes functions like RBAC.

Ansible

Ansible was the brainchild of Michael DeHaan, developed to automate tedious server-administration tasks across large environments. Michael was at RedHat’s emerging technology group where he founded other projects like Cobbler, then went on to found Ansible after leaving RedHat (although, Ansible is now owned by RedHat). From Michael’s blog on the foundations of Ansible, it’s purpose is clear;

“We wanted to create another very-democratic open source project at Red Hat, one that could have a wide variety of contributors and solve new problems. We thought back to busrpc. This project existed because it filled in gaps between Cobbler and Puppet. Cobbler could provision a system, and Puppet could lay down configuration files, but because Puppet was too declarative you couldn’t use it to do things like reboot servers or do all the “ad hoc” tasks in between”

Those ad-hoc tasks evolved into Ansible playbooks, the Ansible module ecosystem was born and boomed quickly.

Design

Ansible is simple, which is a major strength (and will become clear when looking at the other 2). There are no daemons, no databases, very minimal installation requirements. You just install Ansible on a Linux machine and off you go. Define the target servers either in a static file, grouped into meaningful sections, or use a dynamic host-discovery module like Amazon EC2 or OpenStack to find VMs based on an API call. Once you have an inventory you can build out host or group-specific variables which can be leveraged by your playbooks. These again are kept in static text files.

Ansible will then connect to the host or group you choose and execute a playbook. The playbook is a sequence of Ansible modules that you want to execute on the remote hosts written in YAML.

When it connects to a remote host It’s a bit like a well planned military exercise, get in, do the job and get out.

Ansible works by connecting to a server using SSH (or WS-Man/WinRM for Windows), copies the Python code over, executes it and then removes itself.

Architecture

Ansible’s architecture is straightforward, you have the application that runs on your machine and you have the tasks that run on the remote host, communicated to via SSH and files transferred by SCP/SFTP. Ansible doesn’t have a “server-client” architecture like the other 2 products, so you parallelise task execution on your machine, but not scale across multiple servers (unless you use Tower).

When Ansible manages remote machines, it does not leave software installed or running on them, so there’s no real question about how to upgrade Ansible when moving to a new version.

Extensibility

Ansible modules are really easy to develop, as with all 3 products though, read the style guide if you later decide to try and merge your solution into the product’s open-source repository instead of refactoring it again.

ansible/hacking/test-module -m ./timetest.py

You should see output that looks something like this:

{"time": "2012-03-14 22:13:48.539183"}

In modules you can define your own “gather” stage code to establish “facts” about the remote host which can be used by your or other modules. This could be something like looking at the files installed or configuration to determine how a service is setup.

Enterprise Support

Ansible Tower is the Enterprise version, it turns the command line Ansible into a service, with a web interface, scheduler and notification system.

Task Scheduler

It also has a UI for the cloud deployment playbooks, so you can automate deployment of cloud infrastructure through the UI and then automatically add those VMs to the inventory.

It is worth noting that task scheduling, cloud deployments and a server are features of the free versions of both Salt and StackStorm.

Networking Support

Ansible’s networking story is the most mature of the 3, and spans across all of the major network vendors and platforms, with Ansible you can:

  • Automate the configuration of the network stack from system to access to core services by using network platform specific modules and playbooks
  • Test and validate existing network state, implement or leverage the gather process to determine facts about the existing configuration
  • Continuous compliance to check for network configuration drift

Ansible supports Arista, Cisco (all of the programmable platforms), F5, Juniper as well as other vendors. Uniquely for Ansible, the vendor support is mostly provided and supported by the vendor, not the community. At the moment, this shows as better coverage across APIs, more functionality and more recent platform support (supports newer versions).

Strengths

  • Really fast and simple to get started
  • Lots of community examples, documentation and modules
  • Ansible Tower implements features for large, Enterprise deployments
  • Vendor backed network modules

Weaknesses

  • If left unsupervised, operators can keep playbooks, SSH keys entirely on their own laptops. Not entirely the fault of Ansible but keep a close eye on this,
  • No event-driven automation story, you have control over the target host for the duration of the playbook and that’s it, you can’t have long-running tasks.

StackStorm

I’ve been using StackStorm since v0.11 (early pre-Beta), all the way up to the most recent v2.2. It’s a complex and broad-reaching platform that, like Salt, takes a while to describe to people and can often lead to misinterpretation. I see this as both a strength and a weakness. It’s a weakness because it’s complexity can be off-putting and lead people to deploy the wrong solution where StackStorm would have been a great fit (often people writing their own solution from scratch). A strength in that once you understand how to leverage it’s power, it’s really flexible.

StackStorm UI

Design

At its core, StackStorm is a pluggable rule and execution-engine, it is designed as a highly-configurable IFTTT (if-this-then-that) service. You can tell StackStorm to react to events that have occurred and then run a simple “action” (a command) or a complex workflow. Workflows are available in 2 flavors, ActionChain (their proprietary workflow DSL) or using OpenStack Mistral- a YAML-based workflow engine.

StackStorm also has a service for “chatops”, where you can trigger your workflows from events or messages in your chat platform (e.g. Slack).

The core nomenclature in StackStorm is;

  • Sensors are Python plugins for either inbound or outbound integration that receives or watches for events respectively. When an event from external systems occurs and is processed by a sensor, a StackStorm trigger will be emitted into the system.
  • Triggers are StackStorm representations of external events. There are generic triggers (e.g. timers, webhooks) and integration triggers (e.g. Sensu alert, JIRA issue updated). A new trigger type can be defined by writing a sensor plugin.
  • Actions are StackStorm outbound integrations. There are generic actions (ssh, REST call), integrations (OpenStack, Docker, Puppet), or custom actions. Actions are either Python plugins, or any scripts, consumed into StackStorm by adding a few lines of metadata. Actions can be invoked directly by user via CLI or API, or used and called as part of rules and workflows.
  • Rules map triggers to actions (or to workflows), applying matching criteria and mapping trigger payload to action inputs.
  • Workflows stitch actions together into “uber-actions”, defining the order, transition conditions, and passing the data. Most automations are more than one-step and thus need more than one action. Workflows, just like “atomic” actions, are available in the Action library, can be invoked manually or triggered by rules.
  • Packs are the units of content deployment. They simplify the management and sharing of StackStorm pluggable content by grouping integrations (triggers and actions) and automations (rules and workflows). A growing number of packs are available on StackStorm community. User can create their own packs, share them on Github, or submit to StackStorm Exchange.
  • Audit trail of action executions, manual or automated, is recorded and stored with full details of triggering context and execution results.

Design

StackStorm is comprised of a number of services, they leverage a message queue (rabbit) and a datastore (mongo) to preserve state and communicate. StackStorm also has a WebUI (yes, even in the free version) which enables you to configure rules, run actions ad-hoc and inspect the audit trail.

Architecture

Unlike Ansible and Salt, StackStorm was not designed for endpoint configuration or communication. StackStorm has packs for Salt, Chef, Puppet and Ansible, so if you wanted to use Chef for the agent and configuration management system, then leverage StackStorm to call API-based services like Sensu, or Kubernetes, this is easily achieved and obvious. For Salt, you still have this concept of execution on either the minion or the master, if you want to call the Kubernetes API, it’s moot which machine calls the API. The same goes for network-device configuration.

MongoDB can be scaled using well-documented patterns, RabbitMQ also. The Services themselves are all designed as HTTP/RESTful APIs, and can be load balanced for scale out. The roles can be condensed into a single server or spread across a number of servers, depending on your need.

I really like the StackStorm extensibility system, it’s definitely a key strength over the other 2 products. StackStorm extension points are called packs, they are self-contained, can be stored in Git and manage their own dependencies through pack-level Python virtual environments. When you install a pack, you specify the Git URL, or HTTP URL, the (optional) credentials and StackStorm will download, configure and install the pack.

If StackStorm were a programming language, it would be strongly typed. For actions you specify the types for all the inputs, for triggers you specify the fields and types. This makes it really easy to know what is going to be returned by a 3rd party extension, and unique to StackStorm.

Unlike Salt and Ansible, no extensions are bundled with StackStorm, they all must be installed individually, this makes the deployment lighter and also the dependencies very lightweight.

When developing an integration for StackStorm, you can build sensors, actions and workflows into a single definition. Salt and Ansible modules are standalone. So if your extension for say Salt, includes Beacons, Execution modules and State modules, they share nothing except a name and an author. This can prove troublesome when managing pip dependencies.

StackStorm solves this by each pack having its own requirements.txt, as well as a YAML file describing the purpose, requirements and version of the pack. You can upgrade a pack inline and it will keep existing configuration. Packs also contain templated configuration, unlike Ansible and Salt, where the modules configuration format is only kept in the documentation leaving it more prone to user error. Also, you can often be left scanning through module code when the developer hasn’t bothered to document what the configuration options are.

Another unique feature is that ChatOps “aliases” (the chat commands) are built into packs. So if you install the NAPALM pack as an example, it automatically installs all the chat commands for NAPALM.

Enterprise Support

StackStorm is an Apache-2 licensed Open-Source product, hosted on GitHub. StackStorm is owned by Brocade (who were recently divested and the StackStorm portion is owned by Extreme Networks).

If you license StackStorm, you buy a product called “Brocade Workflow Composer”, get additional features as well as Enterprise-level Support. The production deployment I worked with was licensed and I found their support team to be responsive and knowledgable. You also get RBAC, where you can specify to the action level who has access to run what.

Brocade Workflow Composer

Networking Support

If you’re using Brocade VLX or SDX you’re in luck, they’re well supported as you’d expect.

In April 2017 they merged support for the NAPALM library, a cross-platform abstraction Python package for Cisco, Juniper, Arista and others. You can use the NAPALM integration configure routes, interfaces, BGP peering and some other nifty features. Matt Oswalt (co-author of the O’Reilly book on Network Automation) wrote up a nice blog on the progress.

Demo of NAPALM on StackStorm

Strengths

  • The free, default Web UI is easy to use and requires little-to-no knowledge of Python or DevOps.
  • ChatOps integration is built in and works out of the box (with Slack, just deploy the API key) and supports many other chat platforms.
  • OpenStack Mistral is really powerful once you learn it, you can span across multiple DevOps tools, create branches and loops easily without having to
  • Brocade Workflow Composer is a great way to get non-developers to leverage the system

Weaknesses

  • Doesn’t have the range of extension packs available compared with Salt and Ansible. Check whether your systems and APIs are available first, also check what functionality is in the pack.
  • The workflow system, OpenStack Mistral is still quite badly documented, there’s a lot of trial and error in the YAQL query syntax.

Salt

First off, Salt is the product, SaltStack is the company. So when I refer to Salt I’m talking about Salt Open, the Open-Source version.

Salt has a massive nomenclature, at first (and when I say first I mean the first year) it can be really overwhelming. Salt does a lot, so typically when you hear Salt-fans comparing it with Ansible you’ll get a response of “yes, but it do sooo much more”. Similar to StackStorm this works for and against Salt. Once you know what a Salt mine is, it’s quite obvious but if I just told you to fetch grains from a mine you’d think I was referring to a Tolkien novel.

Design

Salt was born as a distributed remote execution system used to execute commands and query data on remote nodes, or “minions”, either individually or by arbitrary selection criteria, or “targeting”.

Salt has been extended to a configuration management system, capable of maintaining remote nodes in defined states (for example, ensuring that specific packages are installed and specific services are running). There are lots of components in Salt and I’m sure I’ve missed others!

  • master, the server that runs the core services to communicate with Salt minions. It also contains the key store for encryption between the minions.
  • minions, the agents that run a micro version of Salt for local execution and communication back to the master.
  • engines, Salt Engines are long-running, external system processes that leverage Salt.
  • states, or formulas, files that contain YAML and templated data to configure minions. The templating engine is also very flexible. It’s not limited to Jinja, but also chetah, genshi, mako (very important for those from a Puppet background), wempy or even pure python.

Minions (proxy or regular) can be targeted using grains, pillars or identifiers. There are other targeting plugins (and you can develop your own, based on something like a SQL query or a KVP store).

  • grains, Salt comes with an interface to derive information about the underlying system. This is called the grains interface, because it presents salt with grains of information. Grains are collected for the operating system, domain name, IP address, kernel, OS type, memory, and many other system properties. The grains interface is made available to Salt modules and components so that the right salt minion commands are automatically available on the right systems.
  • pillars, A pillar is an interface for Salt designed to offer global values that can be distributed to minions. A pillar is a free form resource of data (that can be either JSON, YAML or whatever you need), and can either be stored in files, or externally. This is a unique property of Salt and allows integration with other systems where a shared data store would be of value (e.g. an ITSM or asset register).

For data fetching you can also return data from minions and store it in the salt mine to be used in other tasks like template-based state configuration. Unlike Ansible (which only supports YAML), this can be in a variety of formats.

Architecture

Salt’s architecture is based on a hub and spoke methodology. Some very large deployments have multi-master but this is quite rare. The master can easily scale to many thousands of nodes due in part to the lightweight message bus ZeroMQ. Other deployment models are

  1. A master-less setup, and
  2. Hierarchical masters able to communicate between them using syndic.

The master contains state files, which you would typically put in a shared storage volume. These are set up in a tree so that you can use targeting to specify groups of servers to configure and the environment/applications to deploy.

Salt’s event-based system is using beacons. Similar to StackStorm’s sensor and trigger system, Salt’s beacons fire events into the message bus which can then be dealt with in the reactor (on the master). The rules engine in the reactor is quite crude compared with StackStorm as you’re typically triggering a state or execution command off the back of a beacon firing an event. However, beacons run on the minions, so if you’re detecting events on the servers this is straight-foward. Because StackStorm and Ansible are agent-less this is a unique feature for Salt.

Thorium, the complex reactor for Salt was experimental in the last release and might be supported in future release. It adds support for event aggregation and more complex rules.

Extensibility

Everything in Salt is extensible, down to the modules that display execution results in the CLI. This is a big plus for Salt as you can develop your own changes easily without having to maintain a parallel fork of the main project. Every feature in Salt is also pluggable.

The most common scenarios for extensibility would be, developing a state module (to describe how a piece of software or service should be configured) or an execution module (the code to talk to an API or system). Both state and execution modules can be written with relatively little boilerplate, are well documented and come with a solid unit test runner built in. You can unit test your modules using PyTest without either being on a master or having a master running, for integration testing you should be on Linux, although with a bit of hacking you can run them on OSX (Windows is out of the question, as with StackStorm and Ansible).

You can either maintain your own standalone pack, or contribute directly to the Salt project on GitHub. The biggest downfall with contributing to the main project is that you need to wait for each release cycle for users to be able to easily install your modules. This is around every 3–5 months at their current cadence, so whilst Salt is “batteries included”, it comes with a downside.

Salt also has a package manager, SPM, that is mainly targeted at bundling of their configuration-management (state files) formulas. You can use it for packaging of modules to get around the slow release cycle that I’ll mention in the weaknesses (although this is not very well documented).

Salt has evolved very quickly over the past few years and undergone some big changes. As a consequence there can be inconsistency between the community developed modules. I also find that, although not unique to Salt, the community provided modules are poorly tested.

Enterprise Support

“Salt Open” is the Open-Source version, you can license Salt Enterprise, which comes with some neat features, like:

  • A Web UI for targeting, execution, compliance to “high-state” and integration with LDAP,
  • ServiceNow integration, enabling you to provision new servers and apply state from a ServiceNow ITSM integration,
  • RBAC with LDAP integration (naturally),
  • The “Enterprise API”, which wraps the enterprise features into a REST API.

Networking Support

Because Salt relies on the message bus and ZeroMQ has a number of dependencies that typically require a fully-fletched OS network device management was not an obvious use of Salt. In the last release Salt vastly improved support for “proxy minions”. Proxy minions are a virtual minion, it’s a process able to run anywhere, in order to control devices remotely via SSH, HTTP or other transport mechanism. It leverages the same functionality as as regular minion, but there are also some particularities. To avoid the confusion with the Puppet proxy (which is a central machine and all requests go through it), it’s just a process associated with the device you target, thus a separate process per minion. It’s usually lightweight, consuming about 40MB RAM. You develop can proxy minions by developing a Python module that can be executed on a minion. The Salt team demo’ed this at last years’ SaltConf.

Currently supported Network Platforms are:

  • JunOS (Juniper)
  • NXOS (Cisco)
  • Cisco NSO (Cisco’s NETCONF orchestrator)
  • NAPALM

Salt has recently merged in NAPALM support thanks to some very smart engineers at Cloudflare. NSO and NAPALM have similarities, but with NSO carrying a licensing cost from Cisco you’d need to consider which path you’ll take early on.

NAPALM support in Salt demo

Strengths

  • Salt supports both agent-based and agent-less (salt-ssh)
  • Ultra high-performance for large deployments due to ZeroMQ
  • The agent-based architecture allows for beacons to be deployed on either Windows or Linux based hosts and events to be detected locally
  • Some very large deployments, e.g. LinkedIn using at huge scale
  • Salt can easily be melded into an existing set of databases or APIs through it’s strong extensibility story.

Weaknesses

  • Extensibility built into the core is released too infrequently for fast moving environments
  • Modules cannot cleanly declare their own dependencies, meaning you have to manage a single virtual environment and pip dependencies

Conclusion

Event driven or not?

This is biggest difference between these 3 products, Salt and StackStorm both have event-driven stories. StackStorm has services that you can write (sensors) and strongly typed events that can be raised as well as a complex rules engine. Salt has beacons, services that can be run on the agents as well as on the central master, if you want to detect events on the local machines, this is a unique capability. The open-source version of Ansible doesn’t (nor does it try to) allow you to respond to events.

Community Support

I’ve seen networking vendors specifically target and develop modules for Ansible, where as for the other platforms (with the exception of Brocade for StackStorm) they have been community contributed. Ansible certainly has the broadest breadth of support for networking platforms. Although, with the introduction of NAPALM and NSO into both StackStorm and Salt, this changes things as both support Arista, JunOS (Juniper), Cisco APIC-EM, NXOS et al.

Time to get started

Ansible’s strength is the minimal amount of configuration to get going (basically none). It’s popularity in the Networking space can be due to the simplicity and familiarity for network admins used to using something like a CLI to manage remote devices without needing to deploy any additional servers to run the software. If you have a lot of small, isolated sites (e.g. commercial branches) then you should consider whether your architecture would fall apart. My employer manages networks for some large supermarket chains and I would hesitate at having a centralised master when stores in rural areas can have unreliable connectivity.

Data Configuration Stores

Salt is unique in that it’s key stores are all pluggable. If you want to fetch passwords or keys from Hashicorp Vault, this is a doddle. If you wanted to store grain-data in a SQL database, again it works out of the box. Consider what other systems and platforms would need to access or input the data you’re targeting.

Security

Comparing Ansible and Salt, Salt has it’s own key-store for agent communication and Ansible uses SSH for the transport. A poorly-managed Ansible environment would typically be a bunch of private keys stored on an admins laptop (please don’t do this). Salt offers the unique feature for secure data in templates, states or grains being able to be stored in an external secure data store. StackStorm does keep data in MongoDB, which your security team certainly need to audit before you go to production.

Training

Unless you want to be the sole maintainer of this platform, you’re going to need to educate some colleagues. Salt and Ansible both have detailed books published, StackStorm does not. Salt and (RedHat) Ansible offer training solutions, almost exclusively in the US, StackStorm does not (yet). Salt and Ansible have courses on PluralSight but they are really basic.

Licensing

Both Salt and StackStorm are Apache-2 licensed, Ansible is GPLv3. If you’re not too familiar with OSS licensing, I recommend “TLDR Legal”’s website. Salt as an example has been used by SuSE to build a systems management product, due in part to the flexibility of their OSS license.

Skills

Ansible has, anecdotally (although I pay very close attention to this), got a good mind-share for network admins and DevOps engineers across the globe. You’d certainly find hiring Ansible engineers a lot easier than Salt or StackStorm. But DevOps engineers are still as rare as hen’s teeth so you’re going to be paying top-dollar regardless of the platform.

Which glue should I choose?

Please try at least 2 of the platforms and make an informed decision.

I’ve blogged about this previously, but with DevOps tools people can discover the marvel of automation whilst learning a tool and then just religiously stick to that tool.

Like a kid that discovers chocolate for the first time, you shouldn’t just believe that brief moment of joy is purely the credit of Cadbury.

Credits, thank you to contributions and reviews from Salt, StackStorm, Ansible and community members

--

--

Anthony Shaw

Group Director of Talent at Dimension Data, father, Christian, Python Software Foundation Fellow, Apache Foundation Member.