Engineers working at industrial operations can all agree that asset monitoring systems are producing too many alerts to manage.
One engineer recently told us how overwhelming it is, saying,“We got the monitoring system because we have more assets than we could possibly check manually. But now I have more alerts than I can possibly check. And when I do try to look into them, I have to dig to investigate only to find false positives and low priority issues. It’s just noise and frankly isn’t helping me find issues let alone improve performance.”
What this engineer is experiencing is very common. We hear regularly how overwhelming and difficult it is to manage assets using traditional monitoring systems. Alert fatigue is common and valuable insights are overlooked because the engineers just don’t have the time to investigate each alert to find the ones that are impacting production. This impacts current operations and is not a scalable model for growth.
This article will cover five steps that engineering teams can take to get maximum value from asset monitoring while giving them a path forward for managing growth.
The first step in solving alert fatigue is to reduce alert volumes. This is done by adding a new layer to threshold configuration. Most traditional monitoring systems allow you to set thresholds that trigger alerts from a single metric going above or below a set threshold. The noise comes from the fact that for many assets, a single metric exceeding a threshold doesn’t necessarily mean there is a problem.
Ways to reduce the volume of alerts includes creating alerts when a combination of problems occur, when there is an exceedance over a longer period of time, or when the number of threshold exceedances is over a certain limit.
Intelligent threshold configuration means having alerts trigger based on conditions for which engineers would take action right away. This indicates an event that engineers want to know about and will prioritize for further investigation or repair.
Analytics can provide intelligence that goes beyond simple threshold alarms, including indicating equipment that is acting abnormally, or starting to trend in the wrong direction.
Analytics can provide valuable intelligence such as:
Using advanced analytics provides value by automatically providing insights to engineers so they can focus on issues early before they cause downtime.
Once you have reduced alert volume to those that are worth investigating, engineers still need help prioritizing those alerts and assigning actions for resolution. Using metadata about the alerts combined with recommended actions saves valuable time and improves alarm resolution.
First, having priority levels automated into the alerts allows engineers to quickly identify where to focus their efforts to make the biggest impact.
Using established methods to prioritize alarms allows for automated prioritization. These methods can include: frequency, value related to lost production, alert type, magnitude of exceedance etc.
Second, having automated corrective action recommendations with a simple way to assign them to field personnel is critical to efficiently managing alerts.
As engineers review alerts and determine a plan of action, it is also important that they have a simple way to assign tasks to their personnel to resolve them. Having action plans with insights on how to fix the issue pre-loaded and ready to be assigned within the alert system makes the transition from investigation to action possible with just a few clicks.
While reducing alerts and automating action workflows adds value, additional benefits to operations comes when problem resolution is tracked. Engineers need a simple way to record whether an action plan was successful or not in resolving the issue. Once this is tracked, advanced software can guide engineers by recommending the corrective actions with highest chance of successfully resolving the problem. This makes the engineers and equipment more efficient—able to handle more issues faster. And better resource efficiency allows operators to add more assets without adding people.
No asset monitoring system comes set up to incorporate the unique production environment of every type of industrial operation. Being able to incorporate the knowledge of your engineers into such a system is critical to making it efficient and impactful. This includes making it easy for engineers to edit the equations used for alerting, configure root cause identification, create analysis and corrective action recommendations. Access to open source coding like Python is very helpful here, but for those teams that don’t want to work at the code level, it is important to have drag and drop editors that allow the same level of customization. It’s this extra capability that allows engineers to tweak the system to continue to improve efficiency.
No engineer wants to spend their day combing through alerts that might not result in performance improvements. And no industrial operator wants to be spending money on this fruitless effort. Using new digital solutions and analytics is critical to help engineers reduce alert volume, make alerts more valuable, better prioritize and continuously improve.
The immediate benefit of this effort is that it improves the day-to-day lives of engineers who get significant time back in their day to do the job they were hired for—maintaining and improving asset performance. Operators experience a reduction in downtime and improved equipment performance, and, for those operators that are looking to grow, the ability to scale production efficiently.
Allowing engineers to continue to drown in alerts isn’t sustainable. It’s time to get smart about alerts and make engineers a valuable part of managing asset performance.