Inteligent Monitoring of Distribution Automation

by S. E. Rudd, J. D. Kirkwood, E. M. Davidson, S. M. Strachan∗, V. M. Gutterson and S. D. J. McArthur, UK

Whilst the scope of definition for the ‘smart grid’ is wide and differs across territories, certain visions on  how our energy infrastructure is predicted to evolve are shared. It is envisaged that information and communications technologies will play a key role in the delivery of future networks. For the operation of future distribution networks, these visions translate into a number of changes from common practice:

  • An increasingly observable network - the proliferation of communications and monitoring equipment on distribution networks will result in greater observability at lower voltage levels
  • Bi-directional power flows, introduced through the connection of distributed energy resources - for networks originally designed for uni-directional power flows, this can lead to congestion and problems regulating voltage
  • Increased use or even reliance on distribution automation and active network management, as a means of providing reliable and cost effective supply of electricity
  • Controllable load through various demand-side management measures


 However, these changes to current practice are likely to result in a number of challenges for the utility personnel tasked with operating distribution networks. One such challenge is the increased volumes of data that such a highly monitored, active distribution system, with widespread use of automation, is likely to produce. Intelligent systems researchers within the power systems community have long understood the problems associated with deriving meaningful information from power systems data, especially under extreme conditions, such as during storms or other network events.

Over two decades of research has produced numerous expert systems, and model-based reasoning systems, for alarm processing for both transmission and distribution systems. However, the move to more observable distribution networks, which are active rather than passive, leads to a new set of challenges in understanding system behavior, health and performance of distribution automation and active network management schemes on a day-to-day basis.

Arguably, active network management is still in its infancy. Only a handful of schemes have seen deployment around the world, and utilities are still learning what the impact on the routine operation of the networks will be and what, from an operational perspective, the widespread roll-out of such schemes is likely to entail.
On the other hand, one area where much more experience has been gained is that of distribution automation. Regulatory pressure in the form of incentives relating to reliability of supply, e.g. customer minutes lost (CML) and customer interruption (CI) in the UK; and the customer average interruption duration index (CAIDI) in the USA, have resulted in utilities investing in distribution automation in a bid to increase their revenue.

Distribution automation to improve customer service, and in doing so meet or exceed regulatory targets, can take various forms: remote terminal unit (RTU) based schemes; automatic teleswitching schemes; and novel peer-to- peer communicating schemes, such as S&C Electric’s IntelliTeam2. However, regardless of the type of distribution automation used, understanding both the performance and health of such schemes is an operational requirement. In order to make a positive impact (and not a negative impact) on reliability of supply, distribution automation schemes must operate when needed. Identifying incipient failures, scheme performance issues or problems with equipment health before they result in failure of the scheme to operate when needed, is an important task: it ensures that schemes perform in such a way that justifies the investment in the first place.
This includes the health and performance of the communication systems on which they may rely. Often, such information is implicit in the power systems data that engineers use to make such assessments. Moreover, the volumes of data produced by a large number of schemes can make manual analysis of the data impractical. Symptoms of incipient failure may be seen over several hours, days or even weeks. Tracking such symptoms can be problematic.

This article discusses distribution automation in general and the requirement for automatic analysis of data relating to the health and performance of distribution automation schemes. A case study is included, which considers some of the data analysis problems seen by a UK utility after the widespread roll-out of a particular type of distribution automation scheme. This article outlines some of the decision support requirements from the perspective of the engineers tasked with maintaining and managing such schemes.
In terms of decision support technologies, the article examines the use of complex event processing and rule-based expert systems as a means of dealing with that data. Example rules for identifying a number of scheme health and performance issues, derived through knowledge elicitation, are presented. How these rules are used within a prototype alarm processing system, currently under development, is described. Future extensions to that prototype are also discussed.

Distribution Automation and Smart Grid

The term "distribution automation" connotes a wide set of technologies and approaches to the remote operation of distribution networks. Distribution automation can be thought of as “a set of technologies that enable an electric utility to remotely monitor, coordinate and operate distribution components in real-time mode from remote locations.”  Northcote-Green and Wilson identify three classes of distribution automation:

1.  Local Automation - switch operation performed by protection or local logic-based decision-making operation
2. SCADA (telecontrol) - manually initiated switch operation by remote control with remote monitoring of status, indications, alarms and measurements
3. Centralized Automation - automatic switch operation by remote control from central decision-making for fault isolation, network reconfiguration and service restoration

Examples of all three can be found in the operation of distribution networks in the UK. A fairly high level of telecontrol is commonplace with certain companies, who have made investments in telecontrol due to the CML savings they can achieve on particular classes of circuit.
Restoration via telecontrol can be problematic. Depending on prevailing network conditions, control engineers may not be able to respond and restore customers within three minutes, after which CML start to accrue.
During storm conditions or busy periods, it may not be possible for the control engineer to manage large numbers of restorations. Human factors aside, restoration via telecontrol is beholden to the availability and performance of communications and telecontrol equipment.

Several utilities in the UK have experience with a centralized approach to automatic restoration. Automation scripts run in the distribution network management system environment, e.g. Korn shell scripts run within GE’s ENMAC, can be used to automatically execute a sequence of control actions for restoring customer supply after a fault. For example, a very simple script may do the following after being triggered by the operation of a breaker:

  • Check that automation is enabled on a given feeder
  • Check the status of a remotely controllable sectionalizer and normally open point
  • Open the sectionalizer and check for successful operation
  • Close the normally open point and check for successful operation

Scripting in the Distribution Management System (DMS) is not without its foibles. Scripts may not cover all possible operating states and thus exit early when experiencing unusual conditions. Time-outs due to communications delays can also cause scripts to terminate before restoration is complete. For example, a script may wait 20 seconds for confirmation of the status change of a switch. Communications delays may cause that confirmation to take over 20 seconds to arrive at the control room; however, by then the script will have terminated early. So, like restoration via telecontrol, centralized automation is also beholden to the availability and performance of communications and telecontrol equipment.

Other utilities sometimes use local automation schemes in addition to the centralized approach. These can range from non-communicating automation schemes, such as the use of automatic sectionlizing links including ‘smart fuses’ and autoreclosers, to a communicating RTU-based automation scheme. Recent years have also seen the first trials of peer-to-peer network restoration schemes, such as trials of S&C Electric’s IntelliTeam2 on the Isle of Wight, although IntelliTeam2 has seen a large number of other deployments around the world.
In relation to the ‘smart grid’, DA can play a role in achieving the oft-mooted ‘self-healing’ functionality. Financial incentives to improve reliability of supply can drive distribution network operators to invest in distribution automation to that end. However, increasing levels of distribution automation lead to a requirement for ways of analyzing the data related to scheme health and performance. This article presents an industrial case study from an UK utility which currently employs all three types of automation discussed above.

An Industrial Case Study

A key building block in the utility’s distribution automation infrastructure is what is termed a Network Controllable Point (NCP). An NCP is an item of 11kV secondary equipment that can be controlled remotely, such as a remote terminal unit (RTU) controlling a circuit breaker. Figure 2 illustrates a typical underground urban network, showing the installment of RTUs, which provide the interface that controls the switchgear.  Modern equipment can also be accessed directly with an intelligent electronic device (IED). The utility have over 2800 NCPs installed on the network, with plans to increase them to a number approaching 5300. NCPs have a dual purpose. One is to enable remote control of the distribution components by the engineer manually initiating a command in the control room via the distribution management system (DMS). Alternatively, some NCPs form part of the 235 automatic restoration schemes that the utility has installed on the network. It is therefore necessary that NCPs are in a healthy condition and ready to contribute to both automated and remote control of switchgear .

Automatic restoration can be achieved by placing fault passage indicators (FPIs), central control units (CCUs) and RTUs on the network to monitor and remotely control the actuators at switches and normally open points (NOP) (Figure 2). Each RTU uses VHF digital radio equipment to communicate with the primary CCU, which remotely controls the remote equipment attached to that primary. When a fault occurs in a section of the network, a circuit breaker will trip and isolate the fault. Decision-making logic incorporated either at the control room or the source primary can automatically restore those disrupted customers who are not permanently affected by the fault, through network re-configuration and service restoration.

For example, if a fault were to occur at point A in Figure 2, the circuit breaker at primary 1 trips, taking all customers from primary 1 to the NOP off supply. Communication occurs between CCU1, RTU1 and RTU2 to identify the passage of fault current and voltage changes at these points. Since no fault current would have been seen at RTU1, it opens the sectionalizer. RTU2 will then close the NOP, restoring supply to the customers between the midpoint and NOP.
The DMS, along with the SCADA system’s communication infrastructure and controllable protection equipment, forms the architecture required to perform distribution automation (Figure 3.) The PS Alerts database stores NCP SCADA alarms in real time that can indicate potential issues associated with NCP equipment health.

The SCADA and communication infrastructure shown in Figure 3 allows the status updates and alarms associated with each NCP to be sent back to the control room. It is then the task of the control engineer to examine a series of alarms to identify any important information. Monitoring the alarms of each NCP manually has the following disadvantages:

  • An onerous amount of data is presented to the operator from all the protection devices, as well as the RTUs
  • Moving to a smarter grid will increase the amount of monitoring data further, with alarms being generating from equipment that was previously un-monitored
  • Shift changes of the operators, as well as alarms spanning a number of days, can lead to operators missing previous alarms, which might have given an insight into the NCP equipment's present condition

During knowledge elicitation with experts, five scenarios were highlighted as problems with the NCP’s health, which could affect its performance:

1.  Every six hours, the CCU polls the RTUs to check if they are still online. An alarm will be generated by SCADA if the condition of that RTU changes state (“comms fail ON” or “comms fail OFF”). There are two general situations that engineers look for. The first is an intermittent problem with communications, indicated by a sequence of alarms such as comms ON, comms OFF, comms ON, comms OFF within a 24 hour period. The second is the more serious case of a comms ON alarm that stays on permanently for 48 hours. Due to a significant number of alarms being presented to the control engineer over a number of days, it can be difficult to keep track of the status of each RTU, and the single alarm that corresponds to a permanent communications failure could easily be missed.

2. A common problem associated with RTU operation is a loss of power supply. A ground mounted RTU has an associated power supply unit (PSU), which derives its auxiliary power supply from the LV network. Within the PSU there is a battery backed power supply designed to last at least 24 hours. A loss of LV supply to the RTU sets off a chain of events:

  • First, if the LV supply is lost, through human intervention or other errors, then a "loss of volts" alarm is generated by SCADA
  • If the LV supply is not restored within 1 - 3 days, then the battery will be discharged to the point that it will generate a "battery alarm" by SCADA. When the sealed lead acid battery is allowed to go below a nominal voltage it reverses cell polarity and is damaged
  • Finally, if LV supply is still not restored, the battery goes into deep discharge protection and shuts itself off. This means that during the six hour health check between CCU and RTU, the RTU will not reply and a "comms fail ON" will be generated by SCADA

3. When a control engineer tries to control an object a 20 second timer is started. If this time is exceeded and the object has not changed state then a “scan task timeout” is generated by SCADA indicating that the object has not opened/closed within the allotted time. However, due to different ways in which the signal may propagate through the network, the confirmation of successful operation may feasibly take longer than 20 seconds. It is therefore necessary to identify if objects did in fact operate after they were commanded to do so, and calculate how much longer than the 20 seconds they took. This would allow the object’s timer to be set with respect to how the comms system actually behaves, and reduce the number of unnecessary alarms presented to the control engineer

4. If the state of an actuator is unknown then a “DBI (double bit indication) alarm” is generated by SCADA. The positional indication of remote plant is derived from a double bit I/0 state. If this state is illegal or transitional i.e. DBI 00 or DBI 11, then it cannot be remotely controlled because it is unknown whether it is in an open or closed position. This condition may have been caused by third party intervention on site and requires further investigation

5. Once an automation scheme has successfully operated, it goes into an “auto off and complete” or an “auto off” state. Although automation has restored supply to customers on part of the network that is unaffected by the initial fault, the network remains in an abnormal condition until it is repaired. Only once this has happened can the automation scheme be manually re-enabled. As the “auto off” alarms reside as historical events there is no immediate reminder to reset the automation, therefore, if not reactivated, the distribution automation scheme cannot respond to any subsequent fault events that may occur

Although these five scenarios represent some of the expert’s knowledge regarding the health and performance of NCPs, it should be noted that further issues may arise over time, which the NCP experts are presently unaware of or have not been discussed above. The authors are developing an extensible and flexible system to automate the interpretation of SCADA alarms, shown in the dashed section in Figure 3. This system is intended to provide decision support to the control engineer regarding NCP performance and health, and aims to minimize unnecessary loss of supply, as well as reduce CIs and CMLs attributed to faulty NCP equipment.


Intelligent Monitoring

Within the power industry, there has been a wealth of research into the use of intelligent system techniques for alarm processing. Since each utility has different needs and requirements based on their infrastructure for network monitoring, over the years a variety of approaches have been investigated.
This history of research means that the strengths and weaknesses of each technique are well-understood. For example, the “knowledge bottleneck”, or intensity of time and effort required for knowledge-capture or building of models means that rule-based and model-based systems have limited penetration in the control room beyond some key installations. On the other hand, data-driven techniques such as neural networks require re-training whenever the network topology changes, and cannot present engineers with an explanation of their results, meaning that engineers are wary of the solutions they offer.

While networks remain largely passive and centrally managed, alarm processing can be managed by engineers supported by expert systems. However, with the move towards smarter grids, with associated increases in monitoring data and the need for systems which are more self-healing and self diagnosing, the case for automated alarm processing becomes more pressing. The broader application is not new, but there are new drivers creating a requirement for a distribution automation alarm processor.
Model-based alarm processing is possible in situations where a first principles understanding exists of the relationship between function and structure of components of the system to be diagnosed and the SCADA data, and that knowledge can be easily be encoded as a model.

When knowledge is associational, like in the 5 cases shown above, production rules or causal models can be an appropriate form of knowledge representation. The up-front effort required for knowledge elicitation is more than compensated for by the information such a technique can provide, and production rules map well to the way the engineer considers the problem.
A related technique is Complex Event Processing (CEP), which has been deployed successfully in industries such as banking and finance. Microsoft see clear parallels between these applications and those faced by the smart grid, publishing a smart grid reference architecture (SERA) with CEP at its core. An EU Framework 7 project is currently investigating CEP for detecting security breaches in SCADA systems.

According to luckman, CEP consists of a mixture of techniques, some old and some new. For example, one form of CEP uses knowledge-based expert systems with a knowledge modeling formalism that explicitly groups temporal data into events. Drools Fusion is one such CEP toolkit, an iteration on the popular Drools Expert system shell. Both Drools Fusion and Expert employ the Rete engine for inference, with the main difference being that the Fusion rule language contains temporal predicates, such as ‘before’, ‘overlaps’, ‘starts’, and ‘coincides’.
The nature of the alarm processing problem is such that temporal relationships between alarms are key for correct interpretation of the situation. The presence or absence of particular alarms in sequence is indicative of the type and location of incidents in the monitored system. This means that many rule-based expert systems developed to address alarm processing include some method of temporal reasoning. For example, one post-fault analysis system  has explicit concepts for events and incidents: SCADA alarms are first grouped into events, then related events are linked to form an incident. The investigation of an incident involves analyzing the sequence of events to determine whether secondary analysis of further datasets is required.

This suggests that the power systems community may have been performing CEP-style analysis all along, by using rule-based expert systems for processing of temporal alarm streams. While CEP may not represent a novel concept in the context of alarm processing, the benefit of wider deployment of CEP applications is that a set of tools and standard approaches to modeling events is now available, such as Drools Fusion.
As a result, the authors have explored the use of CEP for monitoring of the distribution automation system. The authors believe that the benefits of knowledge-based systems, including explainability and validation of the knowledge, can be coupled with the CEP approaches to modelling temporal constraints, to produce a system that meets the needs for a distribution automation monitoring system. The following section considers how this could be achieved.

Using CEP for Monitoring Health and Performance of DA
In 2008, research within the authors’ research group developed a prototype system using the knowledge-based expert system shell Drools to diagnose the health of NCP equipment. However, one of the main challenges for this system is in dealing with the temporal aspect of the NCP SCADA alarms, which can span days. Utilizing a knowledge-based approach to process the SCADA alarms means that such alarms are required to be held in working memory for a number of days. Additional rules were therefore required regarding the maintenance of facts in working memory; if after a predefined time certain facts were still in working memory, they were deleted reducing the memory intensity. This added to the complexity of the rule-base.
The authors have since begun exploring a CEP approach. This type of approach removes the requirement of additional timing rules by making each alarm an event and triggering response actions in real time. Drools Fusion offers a CEP framework to handle this reasoning over days/weeks, using the architecture shown in Figure 4 to process the SCADA alarms and diagnose the condition of the NCP equipment.

Knowledge engineering techniques are still required to construct domain knowledge rules if using CEP. The following example shows a rule to handle the alarm for Stage 1shown in Table I, which will be hidden in amongst tens of thousands of alarms within that time period. These alarms are associated with a loss of mains supply to the RTU, which triggers a series of events explained earlier in scenario 2.

Noticeable from the first rule -
If           Alarm Name = Loss of Volts ON
AND     Exists for > 4 hours
AND     Alarm_District_Zone != test zone
Then   This equipment has lost its LV supply and must be visited, inform LV operational support immediately

there is a waiting time of 4 hours, which allows an engineer to visit site and put the RTU back online. It is therefore necessary to handle this rule over a 4 hour period, checking if the loss of volts alarm goes OFF. If this alarm goes unnoticed and the LV supply is not restored, then 2 days later the battery alarm ON will be generated, meaning that the first alarm is required to be stored in working memory for two days, prior to the second rule firing on receipt of a battery alarm ON. The temporal effect of this series of alarms is seen in Table I, where the time between the first two alarms is 2 days, 2 hours, 32 minutes and 4 seconds and the time between the second and third alarm is 6 hours, 35 minutes and 40 seconds. Since each later rule is dependent on the previous alarms, the alarms may be required in the working memory over a number of days.

 The experience of a distribution network operator in the UK illustrates the requirement for automatic analysis of data and the role intelligent systems could play. The five scenarios presented in this article are only a subset of problems domain experts currently see or anticipate seeing in the future.


S. E. Rudd  is with the Institute for Energy and Environment, University of Strathclyde, Glasgow, United Kingdom.

J. D. Kirkwood  is with the Scottish Power Energy Networks, United Kingdom.

E. M. Davidson is with the Institute for Energy and Environment, University of Strathclyde, Glasgow, United Kingdom.

S. M. Strachan is with the Institute for Energy and Environment, University of Strathclyde, Glasgow, United Kingdom.

V. M. Catterson is with the Institute for Energy and Environment, University of Strathclyde, Glasgow, United Kingdom.

S. D. J. McArthur  is with the Institute for Energy and Environment, University of Strathclyde, Glasgow, United Kingdom.

BeijingSifang June 2016