Measuring Propagation Delays and Assessing Performance of Power Utility Communication Networks

Author: Fred Steinhauser, OMICRON electronics, Austria

Many of the concerns originated from network arbitration and congestion problems that occurred in the early days of Ethernet, which are much less an issue with today's switched Ethernet networks. As long as only client/server and GOOSE traffic are present on the network, there are no real challenges because the network load is relatively low. But when Sampled Values become involved, high packet rates and sustained network load occurs. The real-time traffic must arrive timely at the subscribers. Delays and jitter become an issue.

This article describes circumstances when Sampled Values are involved and illustrates some key issues. It also shows that such effects can be predicted to a certain degree and how these predictions are verified by measurements. This is useful for the design of a power utility communication network and for the verification of its performance.
For the propagation of protection-related messages over wide area networks, certain performance criteria have to be met, but these criteria are difficult to verify. The measurement methods used for evaluating the effects of packet interference can also be applied in a distributed measurement system to measure and assess the propagation delay characteristics of wide area networks.

Traffic Segregation with Buses
The term "fully digital" power utility protection, automation, and control system targets at an installation that utilizes all kinds of communication defined in IEC 61850, i.e. client/server communication, GOOSE, and Sampled Values.

Although hardly mentioned in the IEC 61850 standard, the terms station bus and process bus are often used to characterize segments of a power utility communication network that transport different kinds of traffic. It is widely assumed that client/server traffic (also often called MMS after the transport protocol) is only present on the station bus, while the process bus is exclusively for Sampled Values. GOOSE may be on both of them. When no Sampled Values are used and only a station bus exists, the GOOSE messages are of course also sent over the station bus. Typically this is not a problem, because the bandwidth required for GOOSE is low in most cases. When a process bus is actually deployed, the GOOSE messages will more likely be on the process bus, because of their process-close real-time nature.

But the strict segregation between client/server traffic and Sampled Values as often assumed is not necessarily the case, especially when real bus topologies are used. The technical report IEC 61850 Part 90-4: Network Engineering Guidelines, contains a figure titled “Traffic patterns” (figure 57) that is shown as Figure 2. It illustrates where the different kinds of traffic client/server (C/S), GOOSE (GO), and Sampled Values (SV) occur side-by-side in a power utility communication network, even with dedicated station bus and process bus.

The essential point in the given context is the fact that the station controller will not only communicate with the IEDs connected to the station bus, but also with the CB controllers connected to the process bus. The station controller may send controls to operate the CBs and receive reports about the execution of the commands. To reach the CB controllers, client/server traffic must be exchanged over the process bus, where it will interact with the Sampled Values.

Points of Interference
The components that manage the forwarding of the data packets in the local communication network are the Ethernet switches. Each IED has an individual connection to a so-called edge port on a switch. Between the switches, the connections are established by so-called trunk links, connected to trunk ports. The IEDs can send data to the switch at arbitrary times. The switch has then the task to forward the packets, either to another IED on the same switch, or to another switch via a trunk link. When multiple packets arrive from different IEDs at the same or almost the same time and need to be forwarded over the trunk link, the switch has to schedule the packets somehow, because the packets cannot go over the trunk link in parallel, but only sequentially.

IEC 61850 specifies the use of VLAN tags in order to create a fast lane for time critical traffic like GOOSE and Sampled Values. The use of the VLAN feature was not primarily intended to logically segregate the network, but to make use of the priority information that comes with the VLAN tag. The switch sorts the incoming packets into different queues according to the priority in the VLAN tag. When the next packet is to be forwarded over the trunk link, the packet from the highest ranked queue gets the preference. The exact selection depends on the actually implemented packet scheduling strategy in the switch.

But this does not magically resolve all issues. A high priority does not at all guarantee that the packet is immediately re-sent after its arrival in the switch. The trunk port must first become idle before a new packet can be transmitted. When the switch is already transmitting a packet, any newly arrived packet has to wait, regardless of its priority.

Both IEDs in Figure1 transmit packets that need to be forwarded over the trunk link. The illustrated packet timing refers to a case where a large, low priority packet from the IED on the top left arrives in the switch slightly before the high priority Sampled Values packet from the merging unit. The large packet (labelled MSEP for "maximum size Ethernet packet") is sent over the trunk link and the trunk port is occupied until the packet is completely transmitted. In the shown situation, the Sampled Value packet is delayed almost by the entire duration of the large packet, just because it arrived at the switch shortly after the transmission of the large packet had beg

An Experiment with Evaluation

An experiment shall give insight into the effect of load traffic on the propagation of time critical traffic in power utility communication networks. The measurements were made with Sampled Values, but the results apply to GOOSE messages as well. The test setup is the simplest possible layout that allows observing the described effects and it is shown in the following Figure 3.

The network is comprised of two switches S1 and S2, interconnected by a trunk link. A Sampled Value source (a merging unit) publishes one Sampled Values stream into the network. The measurement device captures the Sampled Values packets coming from the Sampled Values source before they enter the network at switch S1 and then again when they are broadcasted from the other switch S2, after traversing through the network.
A PC connected to switch S1 generates load traffic by "pinging" an IED which is connected to switch S2. This forces ICMP messages to be exchanged over the trunk link and therefore interfering with the Sampled Values. The ping utility used for generating this load traffic allows specifying the size and the frequency of the ICMP packets.

Theoretical Examination
The test setup is minimal, all traffic in the network is known and all parameters are under control. The only "noise" on the network is a few administrative messages which cause insignificant traffic. In such a well-defined environment, it is possible to estimate the effect of the traffic interference beforehand, giving some indication for the expected results of the measurements. Thus, the validity of the measurement method can be assessed.

The ICMP messages have the size Sp. At a given link speed nl, each packet occupies the link for a time

(1)

When the packets are issued with the frequency fp, they occupy a share p of the total bandwidth:

(2)

This is also the probability for a Sampled Value packet to get in conflict with an ICMP packet when being sent over the trunk link. Equation (2) shows that the probability for the occurrence of an additional delay due to interference increases with the size and the frequency of the ICMP packets, while it decreases with increasing link speed. In the worst case, a Sampled Value packet is delayed by the duration tp of the ICMP packets. A share of (1-p) of the packets will not be affected at all and will pass just as if there was no load traffic. The other packets will be delayed by a fraction of tp, with only a few packets showing the maximum additional delay tp. The affected Sampled Values packets will randomly meet the ICMP packets in different states of the sending progress with equal probability, thus a uniform distribution of the observed delays can be expected.

In the used test setup, all Ethernet links, also the trunk link, operate at nl =100 Mbit⁄s. The ICMP packets will be issued at a rate in the range of about 1000 packets per second, limited by what the ping source is able to achieve at a certain packet size. For the actual values applied in the experiment, the figures in Table 1 apply.

Measurements and Evaluation
Now, the delay times of the SV packets are to be measured under the different load conditions. This is done by performing a large number of individual measurements and evaluating the distribution of the delay times. The evaluation will be done from 10,000 individual measurements, which will deliver solid statistics. With 4000 Sampled Values packets per second, the necessary traffic can be captured in in only 2.5 seconds.

One condition is that the Sampled Values packets interfere with the ICMP packets randomly, that means that the rate for issuing the ICMP packets must not be correlated with the publishing frequency of the Sampled Values. Since there is no time synchronization between any of the components and the source of the ICMP packets is a PC with its typical mediocre timing properties, the timing is not exact and there is enough jitter to ensure the required randomness.

A first measurement is done without any ICMP traffic, with the Sampled Values alone on the network. This serves as a reference, because also in this case, a delay will be measured that comes from the storing and forwarding of the Sampled Values packets in the switches. Further delays due to interferences will add on top of the reference figure. The delay time distribution of the reference measurement is shown in Figure 4.


The average delay for the packets is about 26 µs, the deviations are minimal. All measured values remain in the interval between 25 µs and 28 µs. The delays are made up by the two store-and-forward processes in the cascaded Ethernet switches. The Sampled Values have a packet size of 152 bytes, so their duration is about 12 µs. The total delay is slightly larger than twice this value and the difference comes from the processing of the packets inside the switch before they are re-sent. In this example, the processing takes on average only about 1 µs per switch.
Figure 5 shows shows the delay time distribution of the Sampled Values when the load traffic with 500 byte ICMP packets is present

9592 of 10,000 packets, this is very close to the expected (1-p) = 96 % still have the delay value of the undisturbed case. The other 408 packets have different delays that look uniformly distributed, which makes perfectly sense and matches the expectations. The maximum delay value is 66 µs, which is exactly tp = 40 µs more than the reference value.

Figure 6 shows shows the delay time distribution of the Sampled Values when the load traffic with ICMP packets of maximum Ethernet packet size is present.

8894 packets, this is again close to the expected (1-p) =89 % still have the delay value of the undisturbed case. The delays of other 1106 packets look again uniformly distributed. The maximum delay value is now 149 µs, which is exactly tp = 123 µs more than the reference value.

The measurement results are in conformance with the expectations derived from the theoretical examination of the experiment. Thus, the measurement method and instrument can be considered valid.

Applications in WAN scenarios

Applications in WAN scenarios

The same measurement principles as applied above work with wide area networks as well. But in such scenarios, the measurement of the propagation delay between the different ends is a particular challenge. Time synchronized test equipment has to be used. The equipment available until recently was special gear for telecommunications engineers, which is normally not at hand in electrical utilities.

Therefore, the assessment of such communication links has often been performed by pinging a remote device. But this allows only a very raw and incomplete assessment. The ping utility measures a round-trip time, which is the total of several times. The dominant factors will be the propagation times in both directions (tAB, tBA) and the response delay of the responding device (trd), which is usually not specified at all. Thus, the total round trip time (trt) measured in site A when pinging a remote device in site B can be written as:

                                      (3)

The statistics provided refer to the round trip times (trt,A), the influence of the individual summands cannot be taken apart. It is not possible to conclude if a jitter comes mainly from the response delay or from the propagation delay in the network. Essential parameters like channel asymmetry, which is crucial for the applicability of time synchronization, cannot be assessed this way.

Also, the ping utility is typically executed on a PC and this limits the accuracy of the measurements because the timing uncertainty on PCs can easily be in the range of several milliseconds.

Figure 7 shows (very much simplified) a setup for such a propagation delay measurement in a WAN scenario. The controlling PC and several details about tapping the traffic (e.g. mirror ports) are not depicted in detail.

The measurement devices on both ends are precisely time synchronized with an error not larger than 1µs. This is indicated by the GPS receivers, but time synchronization can as well be established if precision time protocol is provided in the local networks. To control the whole measurement system from one site, an IP route through the WAN is required to control the measurement device located at the remote site.

The packets to be captured for the propagation delay measurement must not be altered when passing through the network, so they can be recognized on both ends. The measurements can be performed with operational traffic (e.g. GOOSE messages) or with traffic injected especially for the measurements, like ICMP packets for pinging, but now the request and response packets are measured individually on their one-way trips and not just the round-trip as a whole. This delivers the propagation delay values individually for each direction and allows the evaluation of the channel asymmetry as well.

Conclusion

The performance of a power utility communication network is not really challenged when only client/server and GOOSE traffic is present. But when Sampled Values come into play, it is worthwhile to look into this in more detail, especially when other traffic may interfere with the Sampled Values. The explained interferences can lead to considerable jitter. When such interferences occur repeatedly in a network, the jitter may become so big that following Sampled Values packets may catch up to their predecessors.

The effects of interference of Ethernet packets in communication networks can be well understood. By careful examination of the network, expected values for the packet delays can be derived. These expected values are confirmed by precise measurements. This leads to the converse argument that malfunctions of network components can be revealed when measurements are performed and the results deviate from the expectations.

The assessment of the performance of wide area networks is crucial for protection applications that require timely delivery of their mission critical data between the different sites. A distributed, time synchronized measurement system delivers the performance data for each link and individual direction.

With today's modern equipment, such measurements are feasible not only for genuine communication experts, but also for power utility engineers who have to deal with communication networks that are mission critical components of their protection, automation, and control systems.

Biography

Fred Steinhauser was born in Austria.  He studied Electrical Engineering at the Vienna University of Technology, where he obtained his diploma in 1986 and received a Dr. of Technical Sciences in 1991. In 1998 he joined OMICRON, where he worked on several aspects of testing power system protection. Since 2000 he works as a product manager with a focus on substation communication issues. Fred Steinhauser is a representative of OMICRON in the UCA International Users Group. As a member of WG10 and WG17 in the TC57 of the IEC he contributes to the standard IEC 61850. He is also a member of SC B5 of CIGRÉ and contributed to the synchrophasor standards IEEE C37.118.1 and IEEE C37.118.2.

Let?s start with organization in protection testing