Impact of Hardware Design on Failure Mode of Protective Devices

Authors: Paul Myrda and Charles Perry, Electric Power Research Institute, USA

This article presents the design characteristics and assumptions at the substation, and of the protection device design, field testing precautions, factory testing considerations and other factors relevant to this failure.  As the utility industry migrates to smart grid technology and begins to further ramp up the use of microprocessor based products, design considerations and device failure modes become even more critical.  In this particular case device design changes were required to mitigate the root cause as well as field upgrades to installed devices. 

Background
In the fall of 2009 utility personnel at a utility 12kV substation were performing routine testing of a 125V DC transfer switch, Figure 1.  During this testing all circuit breakers on a bus section tripped open automatically after switching the 125V DC transfer switch from Normal to Emergency for the Bus Section and then closed sequentially several times.  In total six circuit breakers operated.  To prevent further operations, the circuit breakers' Local/Remote switches were turned to Local.  One of the circuit breakers was then closed locally, but tripped when it was switched to remote.

Upon investigation it was discovered that the bus section control unit had failed in a state where all outputs were high. When they tried to interrogate the unit, it failed to communicate from the front or the rear ports. As the control unit was powered down, all output relays were observed being de-energized.  All output relays picked up again instantaneously as the control unit was repowered.  The DC Transfer Switch that was exercised also supplies 125V DC control power to this control unit.

This failure mode is not acceptable since it can cause customer outages.  There are several control units installed at the utility's substations, resulting in an expedited need to get to the root cause of the failure and have it corrected.
The control unit was left out of service, awaiting replacement.  As a result, the utility did not have the remote indications or controls for this bus section.  All circuit breakers in this bus section must remain in local for isolation.  The metering for the transformer was also out of service because of the defective control unit which caused the station Voltage VAR Control (VVC) to not function as well.

Preliminary Findings
The failed control unit was removed from the field and shipped to the supplier to determine the root cause of the failure.  The supplier's initial comments follow:

  • The most significant finding is that extensive damage occurred around Vcc and Vss pins, and the devices were exposed to electrical overstress (over-voltage or over-current) originating at those pins
  • No specific signature or chip failure mechanism, like latch-up conditions on parasitic bipolar transistors within the MOS structures, was recognized and acknowledged by the lab
  • Our further investigation is focused on simulating the cause of the device failure corresponding to the transfer switch operation as documented in materials obtained from the customer
  • Effects on control unit power supply during battery bank switching are of primary interest. Control unit grounding requirements:
  • The grounding method described in control unit user's manual is very clear about the function of each grounding terminal on the back panel

Proper grounding practices must be followed. The two back panel grounding terminals, power supply ground terminal and chassis protective earth stud, may be at an equivalent DC resistance, however at high frequencies these two connections do not provide equivalent EMI immunity

Initial Inspection at Vendor Facility
The failed control unit was removed from the test rack and the front panel was removed.  Upon removal of the front panel, an electrically programmable logic device (EPLD) was noticed to have a burn mark on it.  This chip IC2 is responsible for controlling the output relays on the control unit.  Figure 2a shows the burn mark on the EPLD.

After a second board was removed, a chip similar in appearance to IC2 and named IC1 was found to have a similar brown spot on it.  This chip controls general logic (Figure 2b). Inspection of the main board revealed no other visible problems.  Voltage was applied to the unit to perform a temperature analysis.  Below are the results:

  • Temperature was measured off center at lower right corner of label
  • At startup temperature was 260 C
  • IC1 reached 510 C steady after 5 minutes
  • IC2 reached 600 C and still rising after 5 minutes
  • After 10 minutes IC1 remained at 510C
  • IC2 increased to 650C
  • At the center of IC2 the temperature is 110 0 C
  • IC1 at the center (on the dark mark ) is 730 C
  • According to manufacture specs, max operating temp is 850 C

As shown in Figure 3, IC2 temperature rose above the manufacturer's max operating temperature.  This further indicates a failure on this chip. At this point, the remaining boards were removed from the control unit, cross-referenced with a parts list, and inspected for failures.  No other visible failures were noticed on the remaining boards.

 

 

Failure Analysis Final Report

Based on previous analysis through CSAM and X-Ray, the devices were then decapsulated for the surface inspection under high power optical microscope. Carbonized mold compound were found near Vcc pins and Vss pins while blisters were observed near Vss pins for both units Figure 5 and 6. These damages suggest an electrical overstress failure. It is believed that the Vcc and Vss pins of the devices were exposed to an overvoltage or over-current condition on the customer's board, resulting in an electrical overstress incident.

Latch-up studies by Altera's Reliability Department show that this product meets or exceeds JEDEC standard 78. This should lead the customer to review their system setup for possible sources of electrical overstress.

Specific Findings From The Investigation
Microscopic failure analysis of damaged Altera components (designated IC1 & IC2), taken from the control unit, showed carbonized deposit and blistering associated with Vcc and Vss power supply pins on both devices. This would suggest possible voltage and/or current overstress on device power supply pins, i.e.: exceeding 7V maximum rating specified for device.
The control unit failure which was co-incident with 125V DC station battery supply throw-over transfer switch event (between Normal and Emergency bus) is being investigated as the primary event causing the failure.

Re-Creation of Event:
Tests were developed to simulate the effect of transfer switch dropout and recovery.  This required applying 145V DC for approximately 20msec dropout/recovery to the control unit Power Supply Input.  It was evident IC1 was damaged through an indication of loss of communication to unit and a temperature rise of the component.  Then 152V DC was applied for approximately 20msec dropout/recovery to the control unit Power Supply Input and now IC2 was damaged as indicated by mis-operations and temperature rise of component.  During the test, measurement of the control unit, Power Supply +5V DC Vcc Output during input dropout/recovery revealed that 8V peak was present for approximately 5-10msec.  Longer input supply dropout/recovery time and lower input voltage resulted in <7.5Vpeak power supply output without any evidence of damage to components.

 

 

Preliminary Investigation / Root Cause Analysis Progress

During the preliminary root cause analysis, the following information was uncovered:

  • The control unit switchmode power supply transient response characteristics on +5V VCC output was greater than 7V with short duration dropout/recovery on greater than 145V DC input and when ±12V DC bias supply lightly loaded (dependent on number of modules equipped inside the control unit)
  • There was no +5V Vcc transient voltage suppression protection
  • Overvoltage damage of EPLD resulting in Input/Output pins to be in high impedance electrical state
  • High impedance EPLD output signals result in indeterminate floating inputs to analog switch component interface for control points, causing fleeting random mis-operations
  • Confirmed continued mis-operation when damaged IC1 replaced and reprogrammed. Correction to be confirmed when IC2 replaced and reprogrammed. Need to ascertain whether any other failed components exist
  • At steady state power supply input conditions with typical substation transient phenomena the condition was found to be minimal to non-existent as evidenced by zero-incident failure rate of total installed base of 15,000 units, over continuous in-service lifetime
  • Increased probability with very fast power supply input dropout/recovery and higher input transient recovery voltage applied as evidenced by only single incident over multiple and periodic applied transfer switch events, over time and number of units installed

Final Root Cause Analysis Outcome And Enhancements
Two modifications were required to mitigate the root cause of this failure.  These required applying minor circuit modifications on the power supply card and the control output card as used in control units at this utility.

1) Control Unit Power Supply Modification - In the course of an RCA following control unit failure at the substation in the fall of 2009, the power supply unit was identified as the most likely cause of failure of the two EPLD chips (IC2, IC1:).
Input switching conditions similar to those occurring with substation DC battery transfer switches had been simulated at a test laboratory and very similar failure modes were reproduced on other test units in controlled experiments. Transient overvoltage observed on 5V power output used for IC2, IC1 chips was quite significant and, in certain situations, exceeded 7.5V for short periods of time (single milliseconds). As stated in MAX7000 family data sheets, the absolute maximum Vcc rating is 7V. The magnitude of the spike, for comparable power restoration instances, also showed dependence on the exact loads present on +12V, -12V, and 24VISO outputs and, if those loads were removed the overvoltage was increased.

Many of the power supply waveforms were captured on the control unit with hardware configured similar to the unit from the substation but, during experiments, an equivalent resistive load was also used to produce the following currents on the outputs:

  + 5V 2A                            +12V 100mA
-12V 80mA       24VISO- no load

Later verification confirmed that transient behavior of the power supply in a working control unit systems did not differ from that of the power supply loaded with the normal resistive load described.

Most of the testing was performed at the nominal input of 130V DC but the lower limit of 60V DC and upper limit of 150V DC has also been investigated, as well as many intermediate levels.

To provide a reference for later comparison, Figure 7 shows typical +5V and +12V output response when a non-modified power supply is subjected to relatively fast power cycling on its input. Callouts identify specific areas of interest.

A simple power supply modification (removal of Zener diode D13, see Figure 4) was determined to significantly reduce the magnitude of the transients appearing on 5V output when input power is interrupted and restored in rapid succession, similar to transfer switch operation.
D13 was originally intended to protect one of the inputs (negative feedback) to the error amplifier in the main regulation loop by limiting the voltage to 3.6V (as shown in relevant portion of the schematic shown above).  Under steady-state conditions all internal voltages were within their design limits. Fast input power cycling and power restoration changes the re-start conditions since the internal voltages would not completely decay to zero levels. As was observed during experiments, the state of the decaying 5V output at the instance when input power gets restored is the most critical factor affecting how hard the negative feedback line is driven. Not allowing the line to reach levels above 3.6V renders the feedback ineffective for short period of time and produces significant overshoot on the 5V output.

As can be seen from the schematic diagram, U2 pin1 is inherently protected from being driven to higher voltages. The divider formed by optocoupler transistor, R16 and R28 would not produce levels above 5V, and the remaining inputs to input amplifiers do not have any external Zener limiters either.
Removal of D13 Zener diode from the circuitry improves the regulation circuitry response time and, under normal load, produces overshoot well below 0.5V above the nominal 5V. Figure 8a illustrates the reduction in the overshoot.
Under the most severe conditions, without any load on +12V, -12V, 24VISO outputs, and Vin sitting at 150V DC, upon multiple rapid power cycling, the overshoot peaked at 6V DC level as shown in Figure 8b, which is well below absolute maximum rating (7V) for the EPLDs.

As shown, with the removal of Zener diode D13 there is significant improvement in the +5V output transient performance, with no evidence of detrimental effect on other power supply regulation feedback characteristics. The excessive overshoot is minimized when rapid input power cycling is applied to the control unit power supply.

2) Control Card Modification - Independent of power supply modifications, control unit Control Card line termination enhancements for IC2 EPLD signals have been added. Affected signals are:

  • Individual Control point lines (C1 to C16)
  • Control Master Trip, Control Master Close lines (CMT, CMC)
  • Control enable line (C0)

Each of these lines would be provided with 10k pull-down resistor to GND as shown in Figure 9 so that the inputs to the corresponding analog switches on the control card, normally driven by IC2 outputs, would not be left floating if IC2 were damaged and the control unit was energized. The benefit of pulling-down control lines from IC2 was verified in the following scenarios with no outputs operated from energized control unit:

  • IC2 EPLD removed from control unit Mainboard
  • Damaged IC2 EPLD installed on control unit Mainboard

 

 

 

Lessons Learned

This case study clearly highlights some key systems level interactions that may take place between devices installed in substation automation and control schemes.  In this particular case the combination of a high speed DC power supply transfer switch coupled with an overly sensitive power supply design combined to create an undesirable outcome.  In this case a lightly loaded power supply combined with a high speed transfer switch resulted in over-voltages on key control processors.  Also, a hardware design flaw at the manufacturer level allowed control output to go high and automatically trip equipment.  In this case, since this occurred during a maintenance procedure, no power disruptions ensued. 

In the future as more automation is considered for installation in substations and out along feeders, greater attention to design details and device failure modes need to be considered.  Design engineers cannot take for granted the failure modes of modular devices but need to ensure that the modular design does perform its intended function and that its failure modes are well understood.  

 

Sidebars:
In this case study a lightly loaded power supply combined with a high speed transfer switch resulted in over-voltages on key control processors.

Tests were developed to simulate the effect of transfer switch dropout and recovery.

Biography

Paul Myrda is a Technical Executive with the Electric Power Research Institute working in the Power Delivery and Utilization Sector.  He is responsible for grid operation, planning and integration of large scale renewables.  He is also leading the development of the next generation monitoring as it relates to synchrophasors, Transmission Smart Grid projects.  In addition, he is the Data and Network Task Team Leader for the Department of Energy, North American Synchrophasor Initiative.  Previously, he led the asset management program at EPRI.  Paul holds BS and MS degrees in electrical engineering from Illinois Institute of Technology and a MBA from J.L. Kellogg Graduate School of Management. He is a licensed Professional Engineer in Illinois.

 

Charles Perry is Senior Manager Corporate Infrastructure with the Electric Power Research Institute.  In his role, he is responsible for EPRI's physical plant, research related capital budgets, laboratory policies and procedures, and laboratory safety.  He performs research and consulting in the areas of power system reliability, power system metering and monitoring, and power quality.  Previously he managed the Power Delivery and Utilization Laboratories.  Charles holds a BS degree in electrical engineering from West Virginia University and a MS degree in engineering from Marshall University Graduate College.  He is a licensed Professional Engineer in West Virginia.

Ad: Navigate the Sea of Data
Ad: Basler Protective Relays Intertie Generator Feeder Now with IEC 61850
Ad: ABB Relion®. The perfect choice for every application.