hello world!

Alarm Management

  • Definitions of an alarm
    • Based on ISA 18.2 - An audible and/or visible means of indicating to the operator an equipment malfunction, process deviation, or abnormal condition requiring a response
    • The keywords
      • Audible and/or Visible Indication
      • For the operator
      • Indicating either
        • Equipment Malfunction
        • Process deviation
        • Abnormal condition
        • Requiring Response
  • The Core Principles in an Alarm System
    • Relevant (Not Spurious)
    • Prioritized
    • Informative (With documentation and diagnostic)
    • Unique (Not Redundant)
    • Timely
    • Advisory (Action Required)
  • Purpose of Alarm is to prevent
    • Personal injury
    • Environmental
    • Economic Loss
    • Equipment Damage
    • Product Quality
    • Downtime
  • Benefits of good alarms system
    • Improve Operational Effectiveness
    • Avoid costly plant trips
    • Without effective alarming, demand is on ESD
    • Minimize upset time
    • A good alarm system quickly alerts operator to bring the plant to stable state
    • Improve HSE
    • HSE related alarms such as fire alarms, environment and occupational safety alarms
    • Reduce operator’s load
      • More time to concentrate on optimization
    • Legal and Insurance Compliance
  • Alarm Screening / Identification
    • A pre-rationalization process using a combination of maintenance, sophisticated analysis tool and experience to quickly minimize the number of alarms
    • Performed by Control System Engineer, E&I Engineers, Operators and Technicians
  • Alarm Rationalization
    • The process of reviewing each alarm identified from the identification stage
    • Purpose of rationalization is to identify
      • Purpose and validity
      • Consequence of inaction
      • Alarm Class
      • Priority
      • Alarm Set point
      • Operator Action
    • Alarm Rationalization Alarm Checklist. There should be an alarm only if :-
      • There is a potential cause of the alarm
      • There is need and possibility for operator intervention
      • There is a consequence of inaction
    • Consequence of Inaction
      • Is typically a set of tables which are divided into
        • Economic
        • Personal Safety
        • Environment
        • Reputation (Normally not included)
      • Economic Impact can be measured in
        • Time
        • Cost
        • % (of operating cost / feed)
        • Lost in production unit (such as Mtpa for mining plant, BPSD for oil refinery and etc)

The details of these definitions are outlined below:

· Employee / Contractor / Personnel Safety

· To estimate the personnel consequence of an event, consider the following extensions of the keywords given in the margin of the risk matrix

· Low

· Moderate

· High

· Reversible health effects of concern.

· Reversible injuries requiring treatment, but does not lead to restricted duties.

· Medical treatment.

· Severe reversible health effects of concern.

· Reversible injury or moderate irreversible damage or impairment to one or more persons.

· Lost time illness or injury.

· Life threatening or irreversible health effects or disabling illness.

· Single fatality and/or severe irreversible damage or severe impairment to one or more persons.

Table 1 – Personnel and Safety

 

· Environment

· To estimate the personnel consequence of an event, consider the following extensions of the keywords given in the margin of the risk matrix

· Low

· Moderate

· High

· Near-source confined and short-term reversible impact.

· (Typically a week)

· Near-source confined and medium-term recovery impact.

· (Typically a month)

· Impact is unconfined and requiring long-term recovery, leaving residual damage.

· (Typically years).

Table 2 – Environmental Impact

 

· Reputation / Community Trust

· To estimate the personnel consequence of an event, consider the following extensions of the keywords given in the margin of the risk matrix

· Low

· Moderate

· High

· Impact on reputation of a Business Unit. Significant public exposure in local media.

· Tangible expressions of trust / mistrust amongst a few community members with some influence on public opinion and decision-makers.

· Impact on reputation of Product Group. Comment from national NGO which impacts credibility with neighbours/ regional government. Public exposure in the national media.

· Tangible expressions of trust / mistrust amongst some community members with moderate influence on public opinion and decision-makers.

· Impact on reputation of Rio Tinto Group. Comment from international NGO. Public exposure in international media.

· Tangible expressions of trust / mistrust amongst most community members with significant influence decision-makers. Widespread loss / gain of trust across the community setting the agenda for decision-makers and key stakeholders.

Table 3 – Community & Reputation

 

· Business / Production Loss / Equipment Damage

· To estimate the personnel consequence of an event, consider the following extensions of the keywords given in the margin of the risk matrix

· Low

· Moderate

· High

· < 2.5% of Operating cost

· < 0.15 Mtpa

· < 4hrs Downtime

· 2.5 – 7.5% of Operating cost

· 0.15 – 0.5 Mtpa

· 4hrs – 8hrs Downtime

· >7.5% of Operating cost

o 0.5 Mtpa

· > 8hrs Downtime

Table 4 - Business Impact

 

 

  • Alarm Priority
    • There are typically 3 Alarm Priorities which determines the type of audible sound and visual on a HMI
    • The 3 Priorities
      • Low
      • Medium / Normal
      • High / Critical
  • Alarm Priority Matrix
    • Is typically a 3 by 3 matrix which has Consequence and Response Urgency dimension
    • It is used to determine the priority of an alarm during alarm rationalization

· Urgency of Controller Response

· Low

· Moderate

· High

· > 30 minutes (longest time)

· Priority “Low”

· Priority “Low”

· Priority “High”

· 5 to 30 minutes (typical time)

· Priority “Low”

· Priority “High”

· Priority “Critical”

· < 5 minute
(fastest time)

· Priority “High”

· Priority “Critical”

· Priority “Critical”

Alarm Priority Matrix

The example above is for a mining plant which is usually slow response. For process plant, the controller urgency would be >10 minutes, 2-10 minutes, <2 minutes

 

 

  • Alarm Priority Distribution
    • It is a best practice that priority level is distributed in the following manner
    • High = 5-15%
    • Medium = 15 – 30%
    • Low = 55 – 80%

Alarm Management Reports

  • The fundamental alarm management report is the Alarm count over time
  • Cluster Analysis
    • Used to analyze chattering alarms.
      • In Matrikon Alarm Manager Maximum number of events to analyze is the number of events to analyze to check on a cluster. It is good to put the numbr as high as possible such as 10,000,000
    • A cluster is a group of alarms that repeats itself within the specified cluster time (subsequent windows). This is also defined as time windows. A time window is typically set as 60 seconds
    • From the report,
      • The number of cluster tells how many clusters are there.
      • The average cluster member tells the average number of time the alarm chatters in the cluster
    • From the report, the most important analysis that can be done is the total alarm occurrences vs chatter occurrences; the ratio of this is the cluster member%.
      • Total alarm occurrence is the total number of alarms.
      • Chatter occurrence is the total number of alarms which is inside a chattering window.
      • The cluster member % is the chatter occurrence / alarm occurrence
    • Symptomatic Analysis
      • Is used to find an alarm that will always tend to appear after a parent alarm occurs. This alarm (often referred to as the child alarm) is said to be predictable.
      • The predictability measure the % of occurrence the child alarm comes up after the parent alarm. It is recommended that a predictability > 50% is considered predictable and the child alarm can be automatically inhibited should the parent alarm come out
      • The significance measures the % precedence of the parent alarm when a child alarm comes up.
    • Total Alarm Count
      • Sum of Audible Alarms (exclude alarms that default filters apply to) between the start and end time.
    • Total Event Count
      • Sum of all Events (Alarms, Return to Normals, Acknowledgements, Operator Actions, System Messages, etc) between the start and end time.  Exclude events that default filters apply to.
    • Total Intervention Count
      • Sum of all Interventions (message type is Operator Action) between the start and end time. Exclude events that default filters apply to.
    • Alarm Rate
      • The Alarm Rate represents the average number of alarms per hour for the selected areas (exclude alarms that default filters apply to).  Divide by the sum of the operators assigned to those areas.
    • Peak Alarm Rate
      • The Peak Alarm rate will divide data into 10-minute slices.  Take the maximum number of alarms in a 10-minute slice for the selected areas (exclude alarms that default filters apply to).  Divide by the sum of the number of operators assigned to those areas.  Multiply the result by 6 for an hourly peak alarm rate.
    • Percent Upset (Percent Hours in Burst Mode)
      • Slice each 10 minures. If in 10 minutes there are more than 5 alarms, flag that as burst
      • Percent Upset refers to the percentage of 10minute chunks where more than 5 alarms "annunciated" per operator. 
    • Intervention Rate
      • Intervention Rate = Sum of Interventions (message type is Operator Action) for selected areas during the interval / Total Number of operators assigned to those areas / Number of hours in the interval.
    • Intervention to Alarm Ratio
      • Intervention to Alarm Ratio = Intervention Rate / Alarm Rate
    • Priority Distribution
      • Priority Distribution represents the condition field as a percentage.  Results should match the "Alarms By Condition Range" in Excel.
      • A good priority distribution should be as follows :-
        • Critical Alarm = 0
        • High Alarms 0-5%
        • Medium Alarms 5% - 15%
        • Low Alarms >80%
    • Top 20 Alarm Percent
      • Sum of 20 "Most Frequent Alarms" divided by the total number of alarms in the time span selected.  Filters used should apply to both the numerator and the denominator.
    • Top 20 Interventions Percent
      • Sum of 20 "Most Frequent Interventions" divided by the total number of interventions in the time span selected.  Filters used should apply to both the numerator and the denominator.
    • Average Alarm Rate (10 mins)
      • This calculation is performed exactly as Alarm Rate.  Divide the result by 6 for a 10 minute slice.
    • Maximum Alarm Rate (10 mins)
      • This calculation is performed exactly as Peak Alarm Rate, however do not multiply the final result by 6 as you would in Peak Alarm Rate.
    • Average Alarm Rate (Daily)
      • Average Alarm Rate (Daily) = Sum(Total # of Audible Alarms) / Total Number of Assigned Operators / # of Days (for less than 1 day use a fraction to represent the number of hours)
    • Maximum Alarm Rate (Daily)
      • If the Interval is 1 day or less, the Maximum Alarm Rate will be the same as the Average Alarm Rate (Daily).
      • If the Interval is greater than 1 day, then:
      • Maximum Alarm Rate (Daily) = Sum(Total # of Audible Alarms) / Total Number of Assigned Operators.  Repeat this process for each day in the selected interval.
    • Intervention Rate (Daily)
      • Daily Intervention Rate = Sum of Interventions (message type is Operator Action) for selected areas during selected interval / Total Number of Operators for those areas / Number of days in the interval.
    • Intervention Rate (10 mins)
      • 10 min Intervention Rate = Daily Intervention Rate = Sum of Interventions (message type is Operator Action) for selected areas during selected interval / Total Number of Operators for those areas / Number of 10-minute slices in the interval (typically 6).
    • Percent Time < 1 Interventions
      • Sum of 10-minute intervals where "Intervention Rate (10 mins)" is less than or equal to 1 divided by the total number of 10-minute intervals x 100%.
      • This calculation is already normalized per operator.
    • Percent Time > 10 Interventions
      • Sum of 10-minute intervals where "Intervention Rate (10 mins)" is greater than or equal to 10 divided by the total number of 10-minute intervals x 100%.
      • This calculation is already normalized per operator.
    • Percent Time 1-10 Interventions
      • 100% - (Percent Time < 1 Interventions) - (Percent Time > 10 Interventions)
    • Percent Time < 1 Alarms
      • Sum of 10-minute intervals where "Average Alarm Rate (10 mins)" is less than or equal to 1 divided by the total number of 10-minute intervals x 100%.
      • This calculation is already normalized per operator.
    • Percent Time > 10 Alarms
      • Sum of 10-minute intervals where "Average Alarm Rate (10 mins)" is greater than or equal to 10 divided by the total number of 10-minute intervals x 100%.
      • This calculation is already normalized per operator.
    • Percent Time 1-10 Alarms
      • 100% - (Percent Time < 1 Alarms) - (Percent Time > 10 Alarms)
    • Performance Category
      • image
Open-Plant is a revolutionary Industrial IOT Platform software, used to create and deploy Industrial IT apps/solutions. It is an all-encompassing solution offering both back-end and front-end components i.e. the full stack. From our user's experience, creating and deploying Industrial IT apps became 10x faster and 10x less cost. We serve the mining, energy, oil & gas, construction and manufacturing industry. 

OPEN-PLANT PTY LTD

Perth, Australia

EMAIL

info@open-plant.com