Common Substation Platform

by Paul Myrda, EPRI, USA and Herb Falk, OTB Consulting, USA

Substations are becoming more complex both in the power delivery aspects but also in terms of the need to handle wide ranging data services, communication between devices in the substation, devices in the field and back to the corporate data and control centers.

Historically a simple remote terminal unit was able to provide the needed functionality. However, with the continued emergence of inverter-based generation, microgrids and intelligent energy networks, the need for robust, secure communications will expand and the integration of IED’s software directly in a local infrastructure is becoming more essential. This project identified the requirements for a Common Substation Platform (CSP) to address these changing needs.

The CSP concept hypothesizes the following benefits:

Simplifies life cycle maintenance by leveraging proven IT based tools to maintain the CSP environment increasing life cycle agility as compared with typical hardware-based Operational Technology (OT) boxes
Provides the ability to standardize configuration at substations enabling remote upgrades
Improves functionality of the Basic Substation Gateways to meet grid modernization needs
Easing NERC CIP compliance requirements where applicable
Minimizes the number of OT boxes in providing at least the same reliability and performance
Provides an opportunity for advanced algorithms and software such as, but not limited to, prediction, and distributed decision making

This article summarizes the results of a project that identified use cases used to develop and derive requirements for the CSP.

Centralization and Virtualization

The project looked at the platform and software requirements to decrease the number of hardware devices required to implement substations. A sample Single Line Diagram (SLD) of a substation example is shown in Figure 1 demonstrates the typical distribution of functions and Intelligent Electronic Devices (IEDs).

The aggregation of the functions of multiple IEDs into a single computational platform is considered “centralization”. In many instances, one computational platform can be used to execute virtual machines where each machine is the equivalent to a computer. The use of a single hardware platform to host multiple virtual computational environments is known as “virtualization”.

Virtual machines represent a type of virtualization which may be generalized to the use of software containers as might be utilized in microservices architectures. Whereas the SLD shows the electrical connectivity and ancillary connections of the substation, the protection and automation functions are not typically shown in a SLD.

Figure 1 shows the distribution of functions and devices required to automate and protect the electrical network within the substation. The figure shows that there is a distribution of signals, including IEC 61850 Sample Values, between various hardware devices (e.g. Merging Units, Protection Relays, etc.).

There are other “devices” typically deployed within a substation that are not shown in the figure. These “devices” are used to provide cyber security, substation monitoring, condition-based maintenance functionality, and others.

This project studied the use cases and requirements to minimize the number of hardware devices required into a single set of common hardware platforms. This is the concept of centralization.

It is unlikely that a single vendor can provide all the services, applications, required for centralization. It is more likely that function and applications will be virtualized/containerized in such a manner that a utility can mix and match implementation and applications from various vendors. Additionally, it may not be advisable for a single vendor to provide all aspects of the CSP to achieve some diversity. However, diversity does bring issues regarding testing and support.

Current technology allows this to be accomplished using containers where the resources required for the specific application can be packaged and deployed as a single unit. Containers vary in size from small (e.g. a single application) to a package that represents a single computer (a.k.a. Virtual Machine).

At the core of the container architecture is a layer that provides the software within the container access to hardware resources of the hardware on which the container is being executed. This abstraction of hardware is often referred to as a hypervisor.

Figure 2 illustrates the two types of hypervisors: Type 1 and Type 2.

A type 1 hypervisor is implemented directly for specific hardware platforms and provides the equivalent of a custom operating system on which the Virtual Machines access the resources.

A Type 2 Hypervisor utilizes an Off-The-Shelf (OTS) Operating System (OS) to provide the resources of the hardware and some ancillary services to the Virtual Machines. The Virtual Machines are the equivalent to virtualized PCs and represent large containers.

Both Virtual Machine and light-weight containers allow the package to be moved from one platform to another and proper planning to use this capability can decrease the effort to perform upgrades and to increase resiliency which will be discussed in other parts of this paper. Use Case Introduction

To develop requirements for a Centralized Substation Platform (CSP), this report utilizes the construct of use cases to develop the requirements and to identify interaction patterns. The use cases that are to be addressed are shown in Figure 3.

Figure 3 shows the relationships of the various use cases that will be addressed in this project. The ones colored in blue are not within the current scope. The primary use cases are:

Automation: including, but not limited to, Wide Area Monitoring and Protection, automation logic execution and protection
SCADA:Supervisory Control and Data Acquisition services
Asset Management: the management of physical and computational assets. The support needs to include, but not be limited to, patch management and Condition Based Maintenance (CBM)

There are at least three use cases that span several of the primary use cases:

Security: providing services of firewall, access control, intrusion detection, intrusion prevention, and configuration of Role Based Access Control (RBAC)
Time Synchronization: provides the development of the requirements for time services
Monitoring: each of the other use cases have specific requirements for the monitoring, display, and reaction to actionable information supplied. Each primary use case provides the developed requirements for their monitoring requirements

Testing and Advanced Applications (e.g., Advanced App) use cases are reserved for future study.

The ability to acquire information from the field is a functional requirement for:

Conditioned Based Maintenance which is a type of Asset Management
SCADA Systems inherently perform or use SCADA functions
Human Machine Interface receives information from the SCADA System or may have its own SCADA function
Security functions also need to acquire information from the field, although this information would typically not be routed through a SCADA. IT monitoring systems acquire and interact with information and for the purposes of this document utilize one or more SCADA functions
Automation functions also can utilize SCADA functions as well as other protocols. Protection is a type of Automation, and therefore can make use of SCADA functions, that can provide information for Event Recording which then allows for Fault Analysis
Currently, Wide Area Monitoring and Control (WAMPAC) is used for monitoring only within utilities. However, the use of synchrophasors will provide control and decision-making capabilities in the future. Thus, it is shown as a type of automation

Resiliency, and Redundancy

For the purposes of this document, the following definitions apply:

Resiliency: The ability to recover from a problem
Hardware Redundancy: the inclusion of extra components which are not strictly necessary to functioning, in case of failure in other components that functioning can continue regardless of a single fault
Software Container Redundancy: The inclusion of extra resources provisioned to support execution of container functions in the instance that a container has an error. There are two types of redundancy considered: active-active and active-standby

There are several different failure modes that can occur in a virtual environment. Each failure mode should be addressed either through designing resilient or redundant solutions.

Container Content Restoration

Regardless of the hardware selections, it is inevitable that a hardware platform will need to be replaced. The speed, and ease of restoration to an operational state requires that there needs to be a restorable image of the Container that is easily accessible. There are typically three different versions of a container and its contents: initial, configured, and operational. Restoration of the operational container image needs to be considered including utility specific requirements.

Redundancy

IEEE C37.100 defines redundancy as: “The quality of a relaying system that allows a function to operate correctly, without degradation, irrespective of the failure or state of one portion, since another portion performs the same function (not to be confused with backup).”

At the core of redundancy is the principle that there should not be a single point of failure and all functions are duplicated. Figure 4 demonstrates this concept by duplicating containers, and their software (e.g., A and A’) on different hardware platforms. The configuration of the functions within the containers (e.g., A and A’) would have different configuration parameters (e.g., IP Addresses, security credentials, etc.) In order to not have a single point of failure, Local Area Networks (LAN) should not have a single point of failure.

However, due to wiring constraints, the two different MUs may need to share the same Ethernet LAN. Should this occur, steps will need to be taken to provide redundancy at the LAN level and therefore either PRP or HSR should be utilized.

There are a variety of approaches to create the needed redundancy. Only one is presented here.

Different Platform

Figure 5 depicts the concept of a container executing on system/platform A and that fails or experiences performance issues. An external monitor detects the issue and decides to move a copy of the container to System B and starts its execution. This process would include terminating the execution of the container being moved on System A.

The advantages of this approach:

The System B platform is capable of having containers from different hardware platforms moved to it and as such is not required to be a twin of any particular other hardware platform/container combination. This is a typical architecture for cloud computing
Since the restarted container image is the same in both cases, there is no additional configuration, staging, or testing of the container functions are required. There are disadvantages of this approach are:
Requires external monitoring and software to manage the movement of the container. These resources need to be managed and possibly redundant
If the secondary platform is to be used for more than a single platform’s containers, sizing of the required resources may be difficult
The delay in detection of the need to move the container, and the actual motion of the container may not be of a speed that provides protection functionality
A common disk (e.g., SAN) will be required
Testing of the movement of the containers must be tested

Monitoring

There are several requirements related to monitoring that are implied in the previous sections:

Monitoring of the state of hardware resources and their failures.
This type of monitoring relates to CPUs, Cooling Systems, RAM, Disk Drives including Storage Area Network (SAN) and Network-Attached Storage (NAS), as well as others.
Monitoring of the state of redundancy.
- This type of monitoring relates to the status of redundancy including, but not limited to, the designation of primary and secondary

Maintenance, Upgrades, and Patching

Utilities must plan for the maintenance of the software and hardware of the CSP and the EPRI report looks at potential solutions including the use of IEC/TR 61850-90-16.

Cyber Security

The implementation of Cyber Security is predicated upon a degree of physical security and providing communication, or flow, controls upon information control between security zones.

Many countries have similar governmental regulations to those of the NERC CIP requirements. These requirements apply to substations and generation power plants. However good security practices should be employed at distribution level substations and DERs as well. The EPRI report looked at several different aspects of Cyber Security and some of the developed information follows.

Security Architecture

There is a model that has been used to establish best security practices known as the Purdue Mode, based upon industrial concerns. The concepts of the Purdue model has been adopted and enhanced within several ISO/IEC standards which are directly or indirectly referenced by other standards. One such standard is IEC 62443.

The Purdue Model utilizes several zones of protection as shown in Figure 6.

A single CSP may provide information exchanges between several different levels and security zones. For each such exchange, risk analysis should be performed in order to determine if mitigations are needed (a.k.a. security conduits). A CSP may provide exchanges utilized for maintenance (Level 3), HMI and Automation (Level 2), protection and control (Level 1), and station bus/process bus (level 0). The EPRI report details the network connectivity and internal micro-segmentation to allow secure information exchanges.

Virtual environments offer the ability to create virtual network security zones known as micro-segmentation. The creation of these zones allows the implementation of a IEC 62443 or other network isolations without the need to have physical networks.

Introduction to Virtual Machine Security

The use of Virtual Machines provides the promise of being able to isolate security issues to within a single container. However, the degree of cyber security protection achieved is also based upon other factors such as the normal software vulnerabilities, including virus and trojans, of the hypervisor and virtual drivers and shared resources. The promise of containers providing isolation was proven to be incorrect as in 2017 a Chinese hacking team “…escape a virtual machine VMware to boot in only 90 seconds. It was mitigated when it was reported. This hack, and other literature show that segmentation of resources needs to be designed into the deployment.

The draft of CIP-005-8 provides some potential guidance and some questionable exceptions. The current draft provides exceptions from the requirements found in CIP-005that could be utilized by the micro-segments. It is not recommended that CSP deployments claim these exceptions.

Automation and Protection Use Cases

Automation and protection algorithms require information in order to execute properly. A virtual environment is best suited to utilize network-based communications. Examples of required information are: time synchronization, CT/PT, and state information.

Time Synchronization: As digital transformation occurs in substations and utilities, there needs to be a coherent designed usage of timestamps, timestamp accuracy requirements, and timestamp reference.

IEC 61850-5 specifies several different timestamp accuracy requirements all of which can be achieved through the use of Precision Time Protocol (PTP) and the power profile specified in IEC 61850-9-3.

Further research needs to be performed relating to the amount of inaccuracy introduced by virtualization and the best mechanism for synchronization distribution in a virtual environment.

CT/PT: When moving protection to a virtual environment there needs to be digitized CT/PT information provided to the protection functions. The research indicated that use of IEC 61869-9 (e.g. merging units) communication was the preferred mechanism to provide this information. The use of this type of technology must address failure modes that impact the ability of the protection functions from receiving the information needed for protection.

Failure modes may be, but not limited to:

Local Area Network Usage
Time Synchronization to the merging unit fails
I/O limitations on the CSP

The use of Ethernet to receive CT/PT information, and virtualization, have several hardware considerations. Three considerations that need to be analyzed are:

Sizing of the Ethernet Receive Buffers
Interrupt rates
Latency of delivery

All of these metrics are intended to be researched in an upcoming phase.

State Information: IEC 61850 GOOSE is a multicast protocol that is the preferred mechanism for exchange of substation digital state information. However, metrics and issues similar to those related to CT/PT consumption need to be evaluated. These metrics are intended to be researched in an upcoming phase.

Using CSP Based Software Algorithms

The virtualized platform offers the potential opportunity to use protection algorithms supplied by multiple suppliers. This means that there must be a standardized interface that allow inputs to be provided and to provide outputs so they can be consumed as inputs.

This requires a consistent semantics and referential integrity for, quality, and timestamp values. The report is focused on the CIM/IEC 61850 harmonization activities within IEC TC57.

There is also a requirement for a standardized interface to provide the semantic exchange. There is at least one interface/ESB that has reached the maturity to be standardized by IEC and that can be found in ISO/IEC 20922. Even with the interface being standardized, the information to be exchanged and how to subscribe for produced information still needs to be agreed upon or standardized. This may be an item for future CSP work.

Conclusion

The CSP utility requirements project identified a comprehensive set of requirements for the CSP. These requirements were developed based on a primary set of use cases included Automation (including protection), SCADA and Asset Management. There are at least three use cases that span several of the primary use cases. They are Security, Time Synchronization and Monitoring.

The project also identified requirements for resiliency and redundancy, maintenance, upgrades and patching. The project participants were very interested in the protection use case and factors such as merging unit usage and hardware considerations were also investigated. Other aspects that critical to robust operation of platforms such as the CSP is the use of CSP based software algorithms, measured values along with their quality and time stamp requirements. Also considered was the use of messaging such as the use of IEC 61850 GOOSE.

Overall, the project accomplished its objective to define requirements for the CSP. The next step is to perform a series of laboratory demonstrations to verify and validate the requirements and also their implementation in terms of reliability and resiliency.

Biographies:

Paul Myrda is the Grid Operations and Planning Senior Program Manager with the Electric Power Research Institute working in the Power Delivery and Utilization Sector. Paul has been with EPRI for 14 years. Previously Paul led the Information and Communications Technology for Transmission program manager. Paul is an active member of Technical Committee 57 – WG 10 for the development of the IEC 61850 standard. He is also active in the IEEE Power System Relaying Committee. Paul has over 40 years of experience leading technology implementations. His diverse background includes planning, engineering, information systems and project management. He has an MBA from Kellogg Graduate School of Management and MSEE and BSEE from Illinois Institute of Technology. He is a licensed professional engineer in Illinois, member of CIGRE and Senior Member of the IEEE.

Herbert Falk is the Vice President of Testing for the UCA International User Group (UCAIug) and is responsible for managing the Conformance and Interoperability Test Programs within UCAIug. He has managed IEC 61850 Interoperability Tests bi-annually since 2011. He is an editor of IEC 61850 and the US Technical Advisory Group (TAG) lead on Cyber Security to IEC TC57 WG15. He is an active member of the IEEE Power Systems Communication Committee (PSCC), IEEE Power System Relaying Committee (PSRC), and the DNP Security Task Force. Herb graduated with MSEE degree from Northwestern University in 1979.

You may also like