Main 1 and Main 2 Protection - Same or Different?

Authors: Charles F Henville, Henville Consulting Inc., Mukesh Nagpal, and Frank Plumptre, BC Hydro, Dan Buchanan, and Dan Marble, BC Transmission Corporation

Main 1 and Main 2 Protection - Same or Different?

Redundant functionally equivalent protection systems (FEPS, sometimes known as main 1 and main 2) are usually applied for transmission line protection.

What are the relative advantages and disadvantages of using different types of systems for FEPS? This article discusses this question.

Most utilities provide redundant FEPS on the Bulk electric System to maximize the dependability of these schemes to clear short circuits. The following options exist for independent protection systems:

  • Main 1 and main 2 being identical in manufacturer, model, principle and setting
  • Main 1 and main 2 being identical in manufacturer, model, principle, but not in setting
  • Main 1 and main 2 being identical in manufacturer and principle but not in model
  • Main 1 and main 2 being different in manufacturer but not in principle
  • Main 1 and main 2 being different in manufacturer and in principle
  • More than 2 main protections in a voting scheme (2 out of 3) or in series parallel combination (2 out of 4)

The following definitions may be helpful in reading this article.

They are not necessarily identical to standard definitions.

Dependability: The probability that a protection system will trip circuit breakers when required.

Security: The probability that a protection system will not trip circuit breakers when it is not required to do so.

Reliability: The probability a protection system will operate with the required performance. This parameter includes the aspects of dependability, security and speed.

Incorrect Protection Operation (Misoperation): False trips (forOut-of-Zone faults) and Failures totrip (for In-Zone faults).

Common Mode Failure: Multiple failures attributable to a common cause. For instance failure of main 1 and main 2 to clear a fault because of failure of a common trip coil to open a circuit breaker.

Interdependence (of protection systems): The probability that oneprotection system (e.g. main 1) willrespond to a fault the same way(in terms of correct or incorrectoperations) as another protectionsystem (e.g. main 2) at the samelocation that responds to the samefault.

Background

Common practice is for redundant FEPS to be connected in a logical "OR" manner such that correct operation of either system for a fault in the protected zone will result in correct clearing as shown in Figure 1. That is, even if one system is unavailable, or fails, the other will still clear the fault in the required time. The greater the independence of the redundant systems, the less likely they will both fail to clear a short circuit within the time needed due to a common mode failure. To increase the independence of the redundant systems, typical practice is to separate the systems as much as possible. For instance, the redundant systems will use separately protected and independent dc auxiliary circuits, as well as separate cores of current transformers and separate secondary windings of voltage transformers as shown in Figure 2.

The intent is that the combined availability of the redundant systems will be so high that it will not be credible that one or the other system will not be available to clear a short circuit within the needed time. Given that the combined availability of redundant systems will be adequate, the question arises as to what additional degree of independence is required to ensure that the protection performance will be acceptable. Options 1 to 5 in Table 1 provide increasing levels of independence, and associated increasing degree of dependability. However dependability is not the only factor that needs to be considered in protection systems applications. Other relevant factors to be considered include:

  • Common mode failure of hardware, software or principle
  • Balance between dependability and security
  • Historical practice and regulatory issues
  • Protection asset life-cycle issues- standardization, implementation and sustainability
  • Human factors such as understanding different systems

Relevant Issues and Mitigating Factors

Common Mode Failure: As noted above, it is most important for dependability to minimize the probability of common mode failure. In the case of a digital protective relay, various common mode failure mechanisms can be identified.

Hardware: The main concern with hardware failure is related to asset management issues as to whether a common hardware platform could increase the risk of extensive work to solve a systemic hardware problem. The risk of extensive remediation work could be cut in half by using different hardware for main 1 and main 2 protections. However assuming the probability of failure is identical for all unique hardware variations, the probability of a hardware failure will be doubled by using different hardware. Therefore, there is no difference in overall risk [i.e. impact (halved) times probability (doubled)] in using different hardware platforms for main 1 and main 2. However, environmental factors could be a cause of common mode failure when identical hardware is used. For example, one utility has reported privately to the authors that several units of one model of relay in the same substation failed almost simultaneously due to a battery problem. This was a common mode failure that affected all relays effectively simultaneously. Therefore the probability of near simultaneous hardware failure is not zero.

Software (including firmware embedded in a protection system): Software failure that prevented a relay from operating for a fault in the protected zone would be a common mode concern for identical main 1 and main 2 protections. Of particular concern is a software "bug" that might prevent appropriate response of both protections to the same fault. This is one of the most important reasons why users of protection systems choose to avoid using identical main 1 and main 2 protection systems. The dependability concern is most effectively mitigated by using relays with significantly different software. The other main mitigation method is to rely on experience of widespread use of a specific product to gain confidence in the low probability of software failure.

Another concern with respect to software problems is similar to the previously noted hardware problem with respect to remediation of software to correct "bugs" that might not necessarily have caused a problem yet. Assuming that the probability of a software problem is the same for all protection systems, the use of same or different protection systems results in the same risk.

Principle: A principle failure is similar to a software failure in that two relays using the same principle could have a common mode failure in principle to detect a specific fault. Given the extensive history in protective relaying principles, the probability of a principle failure is very low. However there is a concern that the implementation of a specific principle in a specific model of relay could include a mechanism that prevents proper response to a specific fault. Options 1, 2, 3, and 4 of Table 1 are the most susceptible to principle failure, in that order of probability.

Dependability Versus Security-Relative Probabilities:

Protection system reliability is a balance between dependability and security. Increasing emphasis on dependability will decrease security and vice versa. The most significant method to increase dependability is to connect main 1 and main 2 in an "OR" logic as shown in Figure 1. If it is assumed that the probability of failure to operate for an in-zone fault of any protection system is Pf1 for main 1 and Pf2 for main 2, and the failure to restrain for an out of zone fault is Pr1 for main 1 and Pr2 for main 2, then if two protection systems are connected in the "OR" configuration the probability of failure to operate when required is

Pf1*Pf2, and the failure to restrain when required is approximately Pr1+Pr2. For small probabilities (of failing to operate or restrain correctly) the "OR" connection results in very large differences in overall probability of failing to operate or restrain correctly.

The results of calculations performed using a fault tree analysis are shown in tables 2 -5. The tables are limited to comparing the impact of different levels of interdependence between Main 1 and Main 2. Also, in the tables all probability numbers are just assumed for comparison between various scenarios. Actual probability numbers are not available. Table 2 uses assumed equal probabilities of failing to operate or restrain correctly of 0.025 (2.5%) each with complete independence between the two protection systems. This table shows the large increase in dependability due to the multiplication of low probabilities of failing to trip. In Table 2, the interdependence factor of the two protection systems is set to zero. It is assumed that there are no common mode failure mechanisms.

However, there will never be complete independence between the two protection systems. Even with option 5 of Table 1, there could still be some common mode failure mechanisms such as human errors when calculating the settings, or the model of the power system model used for calculating the settings. Figure 2, shows common mode failure points such as the two breakers, (either of which could fail to operate), plus a single battery supplying both systems, and a single VT primary supplying both systems. As the similarity of main 1 and main 2 increases, (to the limit of option1) the interdependence factors will also increase.

For the purpose of investigating the sensitivity of performance to the interdependence, it will be assumed that hardware failure accounts for 50% of incorrect protection operations (ie. misoperations = false trips + failures to trip). Simultaneous hardware failures are assumed to be negligible with independent main 1 and main 2 systems for the purpose of this article, therefore the interdependency factor will never be more than 0.5, even when Option 1 is used. Incorrect settings and incorrect response of the relay are the only cases where interdependencies could arise. Tables 3-5 test the impact on overall reliability of different assumptions regarding interdependence and failure probabilities.

In the case of relatively high levels of interdependence the probability of failure to trip increases by a factor of about 20. This can be seen in Table 3. If the level of interdependence is for example halved, (by implementing main 1 and main 2 systems with more differences, such as by moving from Option 1 towards Option 5), then the level of probability of failure to trip is also halved as shown in Table 4. Alternatively, increased attention to dependability may be paid to the design and implementation of protection systems, such that the probability of failing to trip is lower than that of false tripping. BC Hydro/BCTC experience of protection systems shows that the probability of false tripping is at least 5 times higher than the probability of failing to trip. Table 5 shows that if the rate of trip failure is assumed to be one fifth of the rate of false tripping, the overall probability of failure to trip also drops to a level equivalent to that of halving the interdependence.

A focus on increasing individual dependability by careful attention to the design and implementation of main 1 and main 2 has a similar effect to decreasing the interdependence.

In addition to normal design, installation and maintenance test processes, additional methods that have been used to decrease the probability of failure to trip (and also of false tripping) are as follows:

  • For specific high importance applications perform model power system simulations on specific application with specific relays and settings to predetermine the suitability of proposed settings under a wide variety of system conditions and fault conditions. BC Hydro/BCTC has used model power system tests for all its 500 kV transmission line applications and has discovered improvements in proposed settings to reduce the failure rate in all cases.
  • For general applications, testing of proposed protection systems to verify correct protection operations for users' special application concerns (for instance, high fault resistance), that might not have been addressed in manufacturers' tests.

Such additional tests have given BCTC/BC Hydro sufficient confidence to apply identical main 1 and main 2 protection systems in the majority of its transmission line protection applications.

Historical Practice and Regulatory Issues: Historical practice is a significant factor in many cases. Utilities may have internal standards or practices that require the use of protection systems with some degree of differences. This is an important reason for use of a specific practice with respect to differences between main 1 and main 2 systems. If specific practices have been found to give required performance, strong arguments will be required to change those practices. The authors of this article do not recommend changing historical practices that have been found to be acceptable.

Two regulatory issues impact the question of degree of similarity between main 1 and main2 systems.

Common Mode Failure: It is possible that in some cases, a regulatory body may mandate the use of different protection systems to minimize the probability of a common mode failure. Such mandate is based on the lesser degree of interdependence (or similarity) of different systems. The authors are not aware of specific regulatory mandates for specific differences in redundant protection systems, but in some cases regulators may recommend different manufacturers or principles for main 1 and main 2 protection systems.

False Tripping Protection Misoperations: Regulatory standards may require protection systems that have been determined to have misoperated to be taken out of service within a short time of the misoperation. If there is a high degree of interdependence on false tripping, then there will be increased probability that main 1 and main 2 will both misoperate if one of the two protections misoperates. If both main 1 and main 2 misoperate, there will be increased pressure to repair the systems in a short time. This is a negative side of increasing similarity of main 1 and main 2. However on the positive side, the total number of misoperations will also be reduced because of the interdependence.

Protection Asset Life-Cycle Issues: Several asset management issues are relevant to the question of how many differences there are between the main 1 and main 2 systems.

Standardization: Identical main 1 and main 2 will halve the number of standard designs for protection systems. Use of standard designs is a powerful tool that improves consistency of product and reduces the cost. Consistency is an important benefit for protection systems because it reduces the probability of mistakes in design, operation and maintenance. Modern protection systems are very complex thus creating a significant potential for mistakes in the life cycle use of the product. Cost savings are achieved in several areas by the use of a minimal number of standards:

  • Reduced engineering time in setting, design and support of the installations
  • Reduced manufacturing and construction time
  • Reduced commissioning and maintenance time
  • Reduced spare parts
  • Reduced training costs

Implementation: As the number of differences between main 1 and main 2 increases, the implementation costs also increase. The minimal difference noted in the introduction is to have identical equipment with different settings. In some cases, settings need to be different to cater to different power system contingencies. In these cases, there is no way to avoid setting differences to meet performance requirements. In many cases however, setting similarity can be maintained with no degradation in performance.

Additional differences such as differences in model, manufacturer and principle will of course result in more significant implementation costs such as costs due to differences in designs manufacture, installation, and commissioning.

Sustainability: Issues of sustainability include operation and maintenance of the protection systems. Generally speaking, the greater the differences between main 1 and main 2, the more difficult it will be to sustain the assets. Differences in hardware and/ or principle will result in a need for extra spares, maintenance plans and training.

Human Factors:

Understanding: Differences between main 1 and main 2 present both dependability and security challenges to the human factor. The complexity of modern protection systems results in hundreds or thousands of settings that need to be applied to a given product. In addition to formal product training, increasing use of the same equipment will increase the familiarity of engineers and technicians and increase their productivity and accuracy in use.

Human factors are a possible common mode failure point even for applications with completely different hardware and principle. Complete familiarity with the equipment is an important contributor to overall reliability of applications.

Setting: Significant savings in time and cost can be achieved by the use of option 1 with identical settings for main 1 and main 2 that can be used in either relay (perhaps with just a difference in the device name). However this double use of a single settings file means that settings wrong in 1 will also be wrong in 2. This tends to increase the interdependence and consequently has a negative impact on dependability. Mistakes in settings can be minimized by two processes:

a) Careful and independent review of numerical measurement settings

b) Standardization of logic settings that have found by experience to be reliable.

However an alternative process is facilitated by the ability to perform electronic comparison of two settings files. This process consists of independent preparation of settings that are supposed to be identical for main 1 and main 2 (option 1) by different engineers. Once the two files are independently prepared, they can be compared with each other and any differences discovered and resolved.

Experience

The authors believe that world wide, there is a difference in probability of at least an order of magnitude between overall protection system dependability and security, which is in favor of dependability. That is, protection systems are at least 10 times more likely to false trip than to fail to trip. This fact is made clear in the comparisons of dependability and security using assumed numbers in Tables 2- 5. BCTC/BC Hydro experience shows more than three orders of magnitude difference overall (in combined main 1 main 2 performance) with the probability of failure to trip in the last 10 years being essentially zero and probability of false tripping being more than 1%. It should be noted that BC Hydro uses unusually sensitive ground fault protection due to a significant number of resistive single line to ground faults. High sensitivity will also decrease the security of protection systems. As noted above, BCTC/BC Hydro has experienced many more security problems than dependability problems. A need to improve security has been an important driver for BCTC/BC Hydro to maximize the similarity of main 1 and main 2 systems.

Although the probability of failure to trip is stated as being zero in the last 10 years some incidents of failure to operate at least in the required time, are reported here.

a) A generator fault more than 15 years ago was not cleared by primary or standby protection systems due to both main protections being out of service. This was a common mode failure due to human error. The fault was eventually cleared by other protection systems, but the generator was severely damaged. This incident was in no way related to differences or similarities in primary and standby protection.

b) A 60 kV transmission line fault more than 30 years ago was not cleared due to both main line protections having been blocked. The fault was eventually cleared by overreaching protections on other lines. Again differences or similarities between protections was not a factor.

c) There has been one recorded incident on the BC Hydro system where a principle failure with identical main 1 and main 2 resulted in a large disturbance that resulted in loss of supply to a large part of central Vancouver. In that case, an unusual fault due to a broken disconnect switch resulted in a single line to ground fault on one side of an open circuit. On the other side, a tapped transformer backfed ground current to the fault through the remote terminal that made the fault appear to be behind the remote terminal. Thus the permissive overreaching transfer trip logic in the main 1 and main 2 protections both did not trip. The fault was eventually cleared by backup communications independent ground time overcurrent protection, but due to some miscoordination and another hidden protection failure, the outage cascaded beyond the protected line.

Conclusion

In general, BC Hydro/ BCTC's experience with identical main 1 and main 2 protection systems has been satisfactory.

PDF Version