To Minimize Operational Risk, Start by Preventing Alarm Overload

Virtual Facility
4 min readJan 21, 2023

--

What You Should Know: Inundating facility management (FM) operators / dispatchers with an average alarm rate of > 12 alarms per hour can cause alarm overload, leading to…

· critical alarms being missed,

· incorrect alarm response (human error),

· unplanned shutdown of revenue generating assets and spaces,

· and increased operational risk.

Why it Matters: An effective alarm system helps mitigate risk from facility events that could cause lost revenue. For the alarm system to support risk mitigation requires that it be managed based on performance.

The most common and useful alarm management KPI is the average alarm rate. It is easy to calculate and is a leading indicator (predictor) of future issues (risk). Incident investigations and human performance studies have linked high alarm rates to reduced operator reliability (human error) during critical situations.

Average Alarm Rate Affects Operator Reliability and Facility Management Operational Risk

Operator reliability (the probability that the operator will execute the correct action before event impact) is related to the average alarm rate.

· Under minimal alarm load, a well-trained operator might exceed 99% reliability when responding to an alarm.

· When alarm load is high, human reliability plummets to below 50%; this is when mistakes happen.

· High alarm rates increase operational risk because they decrease operator reliability; operators are less likely to prevent a critical issue from escalating to a major business impact.

More about the Recommended Alarm Rate and Human Factors

Just like you can be overloaded by emails, texts, or notifications through the medium of your choice, an upper limit exists for how many alarms an operator can respond to. A significant body of knowledge exists to characterize the human factors that affect operator performance. Based on this data, recommended performance KPIs (such as average alarm rate) have been established. A “manageable” alarm rate is defined in the ANSI / ISA-18.2 alarm management standard as:

· Avg Number of Alarms per Operator*: 12 alarms / hour or approx. 300 alarms / day

  • *averaged over 30 days, must include alarms from all sources (BAS, Critical Freezers, Elevators..)

Behind the Numbers: Notes on Average Alarm Rate KPI

· Assumes that operators are responsible for tasks other than responding to alarms. KPIs can be adjusted to reflect the bandwidth available for alarm response.

· KPI thresholds can be adjusted based alarm response workflow (e.g., time and effort to create and assign a work order)

· Each operator’s performance limit is established by their own cognitive capacity, which depends on age, mental and physical fitness, previous night’s sleep, and stress level.

· Average alarm rate should be calculated per operator. In a multi-operator control room, it will depend on whether each operator sees the same alarms or whether filtering is applied to show only those they are responsible for.

· Alarms are notifications that require a human action to prevent a consequence. Notifications that don’t meet this criteria (e.g., alerts), are not included.

Make Your Alarm System Better

Follow these steps to reduce the average alarm rate by “cleaning up” your alarm system:

  1. The Right Tool for the Job — Obtain a tool that will calculate avg alarm rate and will identify which alarms are contributing to the high alarm rate. These frequently occurring alarms, called “Bad Actors”, are likely nuisance alarms that are not meaningful or useful to the operator.

2. Track Performance Over Time — Review and record average alarm rate so you can track whether performance is improving or getting worse. (the initial value is the “benchmark” from which performance improvement will be judged).

3. Bad Actor Knockdown — Review the Top 10 alarms on the Bad Actor list to eliminate or fix the most frequently-occuring alarms.

· Evaluate whether the alarm is meaningful and useful (this process is called alarm rationalization).

· What would happen if this alarm was removed? What is the operator expected to do in response to the alarm?

· If the answer is “Nothing”, then the alarm may be able to eliminated or deprioritized.

· If an alarm is useful but problematic, determine why it is occurring so frequently (e.g., poor PID tuning of a control loop is a potential root cause).

4. Automate Alarm Response — Consider automating the response to an alarm if the corrective action is to generate a work order in the CMMS.

“Safety Net” — Automate alarm response so a work order is created if a critical alarm is not responded to in a pre-set time period. This ensures that critical alarms don’t get missed.

“Take a Load off” — Automate the alarm response to reduce the number of alarms the operator must deal with. Depending upon the situation you might automate low priority alarms with long response periods (like changing a filter). Or critical alarms with short response periods.

5. “Rinse and Repeat”- This is a continuous improvement process where “slow and steady wins the race”. A focus on fixing Bad Actors should yield quick and noticeable improvement.

Next Steps

Better alarm data is the foundation for improved FM performance. Labor efficiency, increased occupant satisfaction, minimizing lost revenue, risk mitigation…it all starts with better alarm data. We can help. Contact us to learn more.

Email us: makebetterwork@vfacility.ai

Web: www.virtualfacility.ai

--

--