Sitemap
Adobe Tech Blog

News, updates, and thoughts related to Adobe, developers, and technology.

Design a Reporting Dashboard for SaltStack

5 min readNov 21, 2022

--

For delivering exceptional product experiences to the customers, a massive infrastructure is made for Dynamic Media, considering it highly available, reliable, scalable, and secure. When it comes to making an infrastructure robust, like Dynamic Media, tracking and monitoring infra changes plays a crucial role, especially in an extensive one, where tons of servers are deployed across multiple regions.

Dynamic Media uniquely incorporates the Adobe Experience Manager digital asset management (Assets) workflow to simplify and streamline the digital campaign management process. It helps deliver rich visual merchandising and marketing assets on demand, automatically scaled for consumption on web, mobile, and social sites.

This apparently requires various configuration changes that need to be made regularly on the infrastructure side, including server provisioning, security patching, application deployments, build releases, observability enhancements, etc. That’s where the role of Configuration Management software comes into the picture, which lets you automate such changes efficiently. Hence, we prefer SaltStack as a default automation tool for making such changes, considering some of the great features it offers out of the box.

SaltStack or Salt is a solid yet powerful automation engine that seamlessly automates large-scale infrastructure deployments, offering unique features like event-driven automation, remote execution, job scheduling, REST web interface, etc.

The biggest hurdle — Track changes made through Salt

Let’s first understand the fundamental changes we made in our Salt environment to gain operations efficiency. One of the essential changes was writing various custom Salt modules, states, formulas, reactors, runners, etc., which made infrastructure provisioning faster & smoother. The other was auto-applying states or formulas to all the connected minions at regular intervals by scheduling “highstate”, which helped us prevent configuration drifts across regions.

Highstate is a way to apply Salt states or formulas on its minions dynamically.

But this was only half the battle, as tracking the changes made through scheduled highstate across nodes was quite complicated. Even retrieving other critical events, for instance, state execution, minions specific, etc., was a mammoth task. So, the challenge was reporting such changes made via Salt. This was a must while working with such a vast infrastructure as it could assist in making infrastructure consistent as well as robust.

The solution — A workflow to track changes

Getting visibility of the current infrastructure in SaltStack requires a lot of manual effort & time. One of the easiest ways is to run a couple of salt commands, get the result and parse them accordingly to get some meaningful outcome. But this doesn’t sound like an ideal approach at all while dealing with a bunch of servers.

So, let’s see how we designed an efficient yet reliable workflow that helped us get visibility of the whole infrastructure. The workflow was designed with the help of Salt’s reactor system, Prometheus, and Grafana. The purpose of Reactor was to track highstate events and expose its stats in the form of metrics, whereas Prometheus was to scrape those metrics and design an essential dashboard in Grafana.

Salt’s reactor system triggers actions when events occur in the SaltStack environment, whereas Prometheus is a free software application used for event monitoring and alerting.

Nonetheless, there are multiple solutions available on the internet, which could be preferred as an alternative to Prometheus or Grafana without affecting the core functionality of the workflow. Combining Salt’s reactor system with Prometheus provides a holistic view of the current Salt infrastructure through which the following valuable metrics can be easily exposed:

  • State of Salt Master — Up or Down.
  • Number of connected and disconnected Salt minions.
  • Number of states apply to the minion through highstate.
  • Number of states return errors on highstate run.
  • Timestamp of last highstate run.

Workflow considerations

Prior to implementing this workflow, the following aspects have to be taken into account:

  • Working SaltStack environment, which includes SaltMaster and SaltMinions setup.
  • Salt minions must have a Node Exporter daemon running on it, with “textfile_collector” flag enabled.
  • Prometheus instance to scrap to custom Salt metrics.
  • Grafana instance to create tracking cum reporting dashboard.

Showtime: Let’s see this into action

Now that we have understood the workflow, let’s see how to implement it and make a meaningful dashboard for SaltStack infrastructure.

  1. First, create a “reactor.conf” file on SaltMaster, responsible for watching the highstate events and executing a state on failure.
~ ➤ cat /etc/salt/master.d/reactor.conf
reactor:
# Watches events and pushes data to Prometheus if highstate fails
- 'salt/job/*/ret/*':
- salt://_reactor/highstate_failure.sls

2. Next, create a “higstate_failure.sls” file on SaltMaster, which performs the actual action — writes failure results to a local file on the minions.

~ ➤ cat /srv/saltstack/salt/_reactor/highstate_failure.sls
{%- if data['fun'] == 'state.highstate' -%}
push_to_prometheus:
local.state.single:
- tgt: {{ data['id'] }}
- args:
- fun: file.managed
- name: /var/lib/node_exporter/textfile_collector/salt_highstate.prom
- makedirs: True
- mode: 0664
- contents: |
# HELP salt_highstate_status to check Salt highstate run status. If value is 1 then highstate is getting failed.
# TYPE salt_highstate_status gauge
salt_highstate_status {{ data['retcode'] }}
salt_highstate_timestamp {{ data['_stamp'] }}
salt_highstate_total {{ data['return'] | length }}
salt_highstate_jid {{ data['jid'] }}
{%- endif -%}

3. Once the metrics are exposed on minions, ensure they are exposed through Node Exporter.

~ ➤ curl -sq http://localhost:9100/metrics | grep salt_highstate
# HELP salt_highstate_status to check Salt highstate run status. If value is 1 then highstate is getting failed.
# TYPE salt_highstate_status gauge
salt_highstate_status 0
salt_highstate_timestamp 2022-09-04T11:59:02.837757
salt_highstate_total 2
salt_highstate_jid 20220904115858084292

4. Now, let’s confirm if the metrics are scrapped Prometheus by running the following promql query.

{__name__=~"salt_.+"}

5. Finally, create a dashboard by plotting some useful graphs in Grafana.

Summary

This workflow has not only made reporting easier but also helped prevent inconsistencies across servers. This resulted in a foundation of a rock-solid infrastructure that can be seamlessly scaled as needed. The metrics are not just limited to the ones shown in the dashboard; there is so much more you could add to improve its functionality. Some of the use cases include raising an alert for state failures, tracking disconnected minions, or writing a custom exporter to report other events.

I hope this guide helped design a reporting dashboard for the SaltStack environment. Thank you for reading, if you have any questions or other comments, feel free to drop them below.

--

--

Adobe Tech Blog
Adobe Tech Blog

Published in Adobe Tech Blog

News, updates, and thoughts related to Adobe, developers, and technology.

Vijay Singh Gosai
Vijay Singh Gosai

Written by Vijay Singh Gosai

I am Passionate About Solving Complex Infrastructure Problems and Designing Scalable Cloud-Native Systems

Responses (2)