Network Solutions and their Usability in Substation Applications

Author: Clemens Hoga, Siemens AG, Germany

Substation environment

In substations, IEC 61850 communications based on Ethernet networking are state of the art today. Four types of communications take place on such networks:

  • Client – Server based on TCP/IP MMS (connection oriented)
  • Basic services like NTP, SNMP, HTML (non time critical)
  • GOOSE directly on Layer 2   (multicast, repetition mechanism)
  • Sampled Values directly on Layer 2 (multicast, data stream)

In today’s substations the Process Bus Application has not yet been greatly evidenced in projects. It is expected to be in hot projects in the beginning/mid of the 2010 decade. Nevertheless, redundancy is a big topic for the IEC 61850 station bus as well. One critical parameter is the recovery time of a redundant system. This means the time between the occurrence of the N-1 failure and the moment when the network has fully recovered. This has to be considered together with the necessity that the substation application needs the network exactly at this moment (e.g. Trip that message over the network).  A short, very simplified analysis assumes the following parameters:

  • The network recovers 1 time a year; 
  • Time critical situations like CB-Trip appear in a substation 50 times a year; 
  • Recovery time is 100 ms, 
  • Linear calculation:
    365 x 24 x 60 x 60 x 10 = 315360000/50 = 6.3 x106

The Probability that the network is down during such a trip situation is 1: 6.3 Million per year.

Applications from a Substation Controller to an IED using the Client/Server services are not time critical. TCP/IP mechanisms care for repeating lost frames and the right ordering in the receive buffers. Applications between IEDs (e.g. Interlocking Signals and Trip messages) use the GOOSE service based on a multicast service. Through a repetition mode defined in IEC 61850, it is ensured that these messages do not get lost. Therefore, the communication blackout during the recovery time does not mean that the messages sent out during this period are lost. Even when the double signal change is short enough that it is missed because at the end of the recovery period the signal has the same state as at the beginning, the application is able to recognize the uncertain state by checking the GOOSE message counter which is incremented with each GOOSE repetition.

The use of sampled values is a different issue regarding application. Even if one sample is missing, the protection relay has a measuring blackout of a measuring window.

Possibilities in real Substation layouts

Ring Redundancy: Not all of the IEC 62439 redundancy mechanisms are used in IEC 61850 applications; the most common ones are reviewed.

The Issue of ring redundancy: In principle, an Ethernet System MUST NOT be configured as a real ring. Due to its network access mechanisms, no data frame is allowed to circle around the network. In case of a closed loop / closed ring all connected devices will pump frames into the system but these frames will never disappear in closed rings. Depending on the number of frames per second of the connected devices, even GBit/s systems will crash in seconds. Therefore, measures have to be taken to prevent circulating frames in these systems (loop-prevention). There are different, standardized systems available which prevent circulating frames even in a physical ring topology. The most common systems are described.

MRP (IEC 62439-2,“Hyper Ring”): MRP is the standardized version of vendor specific ring redundancy solutions like HiPER-Ring or High-Speed-Redundancy (Fig 1).

Function: One of the Ethernet Switches is a so called “Redundancy Manager”. It sends out test frames on both ring ports. In a ring without a failure, each of these test frames must appear on the opposite ring port of the Redundancy Manager. As long as these test frames appear, the redundancy manager opens the loop and the circulation of frames is prevented (Fig 2). If the ring is interrupted, either because a connection is broken or a switch in the ring is defective, no test frames appear on the receiving side of the redundancy manager. The redundancy manager closes the loop and all devices are connected to each other again. Coupling of rings is not standardized in MRP (switch vendors use proprietary dual link solutions.)

RSTP (IEEE 802.1w/802.1d-2004, IEC 62439) Ring and meshed configurations: RSTP can provide pure ring configurations as shown in MRP and meshed configurations as well. Consequently, the loop prevention principle must be different to pure ring systems. One of the Ethernet switches is the so called root bridge (Simplified: bridge = switch). This is the bridge with the highest so called “root priority”. All ports at this switch are designated ports (Fig. 3).

Ports closest to the root bridge are “root forwarding”. The path with the closest connection to the root is active. Non necessary ports are blocked for loop prevention. If the network configuration is able to, every path has a pre-configured, alternative path; blocked ports can become active when the primary path is defective. In the case of root bridge failure, the bridge with the next higher root priority takes over the root bridge function (Fig. 4). RSTP can be used in ring configurations as well. In pure rings it shows good performance of recovery speed 4-5 ms per hop, multiple meshed systems can lead to larger recovery times. This mechanism utilizes implementation of small switches in IEDs like protection relays directly. Cost effective configurations can be achieved, such as the one shown in figure 6.

The IEDs may have integrated switches which are RSTP-aware. Multiple rings are possible. One of the multiport switches is the root switch which organizes the optimal communication paths by establishing Root ports and designated ports. At the same time alternative redundant paths are foreseen, but blocked in normal operation. In the case of n-1 failure the alternate path will be activated. RSTP has settable parameters such as aging time, root priority … Using recommended, pre-configured parameters in multiple meshed configurations can end up in recovery times of 1 second but optimized parameters allow smaller re-configuration times.  

Dual homing (dual link) redundancy: In a dual homing configuration, the two interfaces in an IED and in a substation controller have two interfaces. One is active; the other is actively monitoring the backup link if it is still usable (Fig. 5a). In the case of an n-1 failure the IED checks the missing link and switches over to the reserve link. It sends out a special message in order to establish the alternative path. This establishment is reduced to the missing link only; therefore the recovery time is very fast (Fig. 5b). This type of redundancy is described in principle in IEEE 802.1d but is often implemented with some proprietary functionality.

Mixed Configurations: Dual homing and ring configurations can easily be mixed. The most typical configuration is as follows (Fig. 7). The Main Ring and the sub rings are using RSTP or MRP, the IEDs are dual connected by use of link redundancy. This kind of mixed technologies provides true n-1 redundancy with very low deterministic recovery times based on the available and proved technologies. Due to the fact that both technologies work independently, the recovery times do not add.

Seamless Redundancy Mechanisms

General: Seamless means, that the redundancy system has almost no recovery time. This can be accomplished in only one way. The message is packed into two frames and is sent in two different ways to the receiver. The receiver takes the first frame, identifies it and discards the redundant second one with the same message. This works in a parallel configuration as well as in a ring configuration; the principle is the same. IEC 62439 specifies in part 3 two seamless systems:  PRP and  HSR.

PRP Parallel Redundancy Protocol (IEC 62439-3): The PRP principle is shown in figure 8. The networks A and B can have any structure, Star, line, ring, meshed. It is also not necessary to use the same configuration on Network A and Network B. Each PRP device has a DANP (dual attached node using PRP). The redundancy entity is below Ethernet layer 2, it works for all Ethernet protocols. This means A and B frames have the same MAC and IP – address. As a result network A and B must be isolated, a connection between A and B will lead to ambiguous addressing and consequently trouble the network.

Figure 9 shows a basic structure of PRP.  Switches do not need PRP-extensions they can be off the shelf but must fulfill the substation’s environmental requirements. Most of the devices are DANPs. SANs (Single attached nodes) can be involved in the PRP mechanism by use of a redundancy box. The system allows also connecting SANs without the need for redundancy to one of the networks. This communication is subsequently limited to one network. A single attached node can be involved in the PRP Redundancy system by use of an external device called Redbox.

The duplicate filtering takes place by the use of a PRP control sequence at the end of the Ethernet frame. It consists of a sequence number, a LAN A/B information and an additional indication about the length of the frame size.

This means, the receiver needs to read the entire frame and then start checking if this is a duplicate or not. At 100 MBit/s this takes about 120µs for big frames (1500 x 8 x 0,01 µs). Special supervision messages are defined also in the standard in order to monitor a RPR system and to provide information about failures occurred in the network system.

HSR (IEC 62439-3) High availability seamless redundancy: HSR extends the zero-recovery-time-redundancy to all relevant topologies, e.g. rings, coupled rings and dual network (parallel) configuration. Seamless Ring configurations have their basic origin in IEC 61158-6 PROFINET and have been handled and harmonized with PRP in IEC 62439-3. Their advantage is the possibility for integrated switch interfaces like the RSTP-Switches in IEDs. This may downsize the network infrastructure to similar configurations as shown in figure 6. 

One the other hand, the bandwidth consumption in ring configurations is higher than in doubled networks, where the traffic load of A and B frames is distributed to LAN A / LAN B. But HSR also works in the LAN A/ LAN B Parallel mode and in this configuration HSR has the same performance as PRP (Fig 10). The Source sends out the frame in two directions, the receiver takes the first (A or B) and discards the second frame. Nodes without a HSR-interface can be connected by use of a RedBox (Redundancy Box). In case of a multicast frame, the sender discards both frames. In order to provide structured networks, redundant rings have to be coupled redundantly. This can be accomplished by the use of the so called Quadbox-devices, which is a dual-Redbox inside one device. The parallel structure can run with PRP or HSR end nodes. The Ring structure requires HSR in any case. Both can be interlinked with a Redbox. The structure of a HSR frame is:

Due to the needs of an Ethernet node in a ring the frame structure of HSR is different to PRP. The HSR-Tag is close to the Header of the frame and consists of a sequence number, related to a sender-receiver pair, a path identifier and a length information. There is no need to read the entire frame to decide if this frame is the first or the second one, that has to be discarded. This decision takes place after the first 18 Bytes instead of the 1500 Bytes (worst case of PRP.) (18 x 8  + 8 x 8 (preamble) = 208 x 0,01µs = 2,08 µs). 

This speed and the guaranteed time for a decision enable ring configurations and coupled rings with RedBoxes. Consequently, coupling HSR frames only from parallel to ring is very effective and can handle lots of nodes. This makes the Redbox function easier. A Redbox converting PRP into HSR frames and vice versa need lots of memory space and computing power.

Monitoring and supervision of the Redundancy system is provided by the use of special monitoring frames as it is provided in PRP.

Application layer redundancy

Redundancy based on doubled systems using hot standby functionality use coordinated dual computers with complete separated communication stacks. Usually they have two MAC and two IP addresses. They shall be mentioned but are not handled here.

Status of Standardization

The solutions described have the following status:

MRP

IEC 62439-2  IS Edition 1 2008 - 5, Edition 2 2010-1

RSTP

IEEE 802.1D-2004 , referenced in IEC 62439-1 ED 2 2010-1

Link Redundancy

In base in IEEE 802.1D mostly with proprietary extensions

PRP

IEC 62439-3 ED1 2008-5 - Edition 2 IS -2010-1

HSR

IEC 62439-3 IS ED 2 2010-1

Relation to IEC 61850:  IEC 61850 does not define redundancy systems in principal. It references to existing standards. In Edition 1 (2004-5) none of the protocols above is listed. The Topic of redundancy is not handled in this document. In Edition 2 (most parts will be issued beginning from mid 2010) the issue of redundancy is newly discussed. Parts of discussion of this topic are:

8-1 Mapping MMS/Ethernet

                  Status CDV

      RSTP, PRP, HSR are mentioned as optional protocol Reference to TR 90-4

9-2 Mapping of Sampled Values

                  Status CDV

       PRP and HSR is mentioned as optional protocol Reference to TR-90-4

90-4 Network Engineering Guidelines

                  Technical Report, CD

       Complete overview on all relevant substation communication network topics

Discussion of the Pro’s and Con’s

One has to distinguish between the different applications used in substation communications. For TCP/IP traffic (IED to Substation Controller), RSTP and /or MRP is well enough!  More critical are Interlocking and Trip GOOSE messages. As stated earlier, the probability of the coincidence of a critical event in the substation and a network recovery period is about 1: 6.3 Million.

MRP is easy to install because there are no parameters to set, but it is limited to a one ring configuration.

RSTP in the so called “Garland Configuration” (Fig. 6) is cost effective and as reliable, but RSTP in a meshed loop configuration has too complex recovery mechanisms and the recovery times are hardly predictable. Combined configurations of Link redundancy with RSTP provide a link switchover time of 10 ms and the Ring recovery time of n (number of switches) x 4ms, these two times do not add each other.

PRP provides seamless redundancy, the dual sending of frames provide an end-to end redundant path over two isolated networks. A doubled network infrastructure with external switches is necessary. These switches do not need redundancy implementations like MRP/RSTP for n-1 redundancy, because the redundancy logic is transferred into the IED itself. PRP is limited strictly to the use of two totally isolated Ethernet networks. 

HSR offers seamless, zero recovery time redundancy and is able to cover almost all possible configurations including ring configurations.  In principle, using Switches with integrated RedBox functionality will enable seamless redundancy in cost effective configurations. Transitions from parallel systems to ring systems can be implemented with good performance. Coupling of multiple rings are possible.

Both PRP and HSR systems require two interfaces in the end device, relay or substation controller. HSR in ring configuration requires more bandwidth than PRP because two frames are sent on one network at the same time.

Biography

Clemens Hoga received the Dipl.-Ing. degree in information technology from Georg Simon Ohm University of Applied Science, Nürnberg, in 1985. He joined SIEMENS in 1986. From the beginning he was involved in the development and product management of communication protocols, especially for factory automation applications. In 2000 he changed to Power Transmission & Distribution group and was responsible for the Substation-IT project.

Currently he is the Principal Key Expert Communication of PTD Energy Automation.

He is a member of IEC TC 57 WG 10, and is the convener of the DKE 952.0.10 (the German IEC 61850 mirror committee) and acts as a vice chairman of the UCA International, the users group for IEC 61850, CIM and open AMI.

Relion advanced protection & control.
Let?s start with organization in protection testing