Suppressing Alert Noise during Scheduled Maintenance

Squadcast
Squadcast
Published in
3 min readFeb 2, 2024

Alert noise is a common problem for IT teams that monitor and manage complex systems. Excessive unactionable alerts triggered by various sources, such as applications, servers, network devices, etc., can cause alert fatigue. The higher volume of alerts can be overwhelming, reducing the ability to respond to critical alerts.

One event of possible alert noise is during scheduled maintenance, awhich is a common practice in the digital realm.

Problems at Hand

Maintaining operational continuity is non-negotiable. Here are a few common use cases that demonstrate the challenges of alert noise faced during scheduled maintenance.

  • Proactively muting all alerts from specific sources like Datadog.
  • Muting Specific alerts from monitoring tools like Prometheus, Datadog, or New Relic while enhancing API sets.
  • Suppress a high volume of alerts while conducting load testing on a web application or server.
  • Known anomalies or irregularities in certain systems or processes that generate alerts.

Suppression Rules

Squadcast’s suppression rules offer granular control for muting alerts from various sources. You can use these rules to focus on maintenance tasks without getting overwhelmed by unactionable alerts. They can be applied to entire services, multiple alert sources, specific variables, or combinations to suit your needs.

How To Configure Suppression Rules?

You can suppress unwanted alerts with Alert Suppression.

  • Every service in Squadcast supports Alert Suppression allowing you to create suppression rules by choosing an alert source or a host and add conditions to suppress alerts by time.
  • The ‘time’ in this case will be the maintenance window you’ve anticipated for your maintenance.
  • Within the service, if you receive notifications from a selected alert source for the set period, the alert notifications shall be suppressed.
  • If you are aware that the payload contains variables specific to the API currently being worked on, you have the option to employ a suppression rule that targets that particular API, allowing you to temporarily suppress alerts of that nature.
  • Similarly you can add new conditions and rules. This is an efficient way to control alert noise when you’re running maintenance at your end.
  • Host-specific suppression ensures that your monitoring process for the rest of the system remains intact and unhindered.

Impact: Alert Suppression reduces alert noise during maintenance ensuring you receive only important alerts. However, it’s important to note that

  • Suppressed incidents cannot be acknowledged, reassigned, or resolved.
  • Post mortems are not available for suppressed incidents.
  • Further customization and configuration of your suppression rules can be done using REST APIs.

For more information, explore the short video on suppression rules & configuration options for alert suppression and maintenance mode in Squadcast.

Conclusion

Any downtime or disruption can have a ripple effect on your business. But what happens when you need to make improvements or perform maintenance on a specific part of your infrastructure without causing unnecessary alert noise?

Squadcast offers a practical solution through automation rules, specifically suppression rules. These rules allow you to gain granular control over alert noise during maintenance periods without sacrificing overall system monitoring.

So, you can now create an environment that promotes focused Incident Response and facilitates an enhanced Incident Management experience.

Squadcast is a Reliability Workflow platform that integrates On-Call alerting and Incident Management along with SRE workflows in one offering. Designed for a zero-friction setup, ease of use and clean UI, it helps developers, SREs and On-Call teams proactively respond to outages and create a culture of learning and continuous improvement.

Originally published at https://www.squadcast.com.

--

--

Squadcast
Squadcast

Reliability Workflow Platform for SRE, DevOps & IT