Cyber Security and Resilience Guidelines for the Smart Energy Operational Environment

Author: Frances Cleveland, Xanthus Consulting International, USA

While the energy business environment is experiencing these paradigm shifts, the energy industry has accelerated its evolution toward digitization and is becoming increasingly reliant on cyber assets (systems, controllers, intelligent devices) to manage the delivery of electrical energy. These cyber assets are crucial to the safety, efficiency, and reliability of electrical energy.
However, these cyber assets also present serious challenges: businesses must also determine how to cope with the reality of deliberate cyber-attacks, such as the successful cyber-attack against the Ukrainian SCADA system, as well as how to remain resilient to the more mundane but equally critical inadvertent cyber threats arising from personnel mistakes, the complexity of systems, the multitude of new participants in this energy market, equipment failures, and natural disasters. So, energy businesses that used to address only the system engineering process (design, deployment, integration, procedures, and maintenance) must now also include cyber security services and technologies into these engineering processes. As a result, the new systems could be significantly different in configurations, capabilities, and constraints.
In the energy operational environment, there are five critical concepts for cyber security that should be understood as these energy businesses struggle to implement the necessary cyber security policies, procedures, and technologies. These five concepts are captured below and briefly discussed in the following sections.

 Five Critical Concepts on Cyber Security and Resilience for the Smart Grid
Concept #1. Resilience should be the overall strategy for ensuring business continuity: When focusing on resilience in general, organizations must consider safety, security, and reliability of the processes and the delivery of their services. For "cyber resilience", organizations must similarly consider safety, security, and reliability for cyber assets, including resilience before security incidents (identify & prevent), during such incidents (detect & respond), and after incidents have been resolved (recover).  Cyber resilience thus involves a continuous improvement process to support business continuity: it is not just a technical issue but must involve an overall business approach that combines cyber security techniques with engineering strategies and operations to prepare for and adapt to changing conditions, and to withstand and recover rapidly from disruptions. Information sharing within and across organizations is also becoming crucial as a part of resilience.

Concept #2. Security by Design is the most cost-effective approach to security: Security is vital for all critical infrastructures and should be designed into systems and operations from the beginning, rather than being applied after the systems have been implemented, like a surface coat of paint. This means that the products, the systems, the processes and the organization should be designed or setup from the beginning with security in mind. However, recognizing that security cannot quickly be added to existing systems, particularly since power system components may have different life cycles, it is crucial that even for these existing systems, transitions to security-based configurations should be designed into system retrofits and upgrades. Security by Design is not just the addition of technologies but must combine business organizational policies with continuing risk assessments and security procedures. Organizational policies include security regulations, personnel training, and segregation of duties, while security procedures include CERT information sharing, backup and recovery plans, and secure operations. Security technologies include physical and virtual techniques, such as physical site access locks, access control, authentication and authorization for all communications, and security logs.
Concept #3. IT and OT are similar but different: Technologies in Operational environments (called OT in this document) have both similar and differing security constraints and requirements from Informational Technologies (IT) environments.  The primary reason is that power systems are cyber-physical systems and security incidents can cause physical safety and/or electrical incidents, while such physical consequences are not usually a problem in corporate environments. For IT environments, confidentiality of sensitive business and customer information is generally the most important requirement, but for OT environments, availability, authentication, authorization, and data integrity are usually the more critical requirements, since power data is typically not confidential. In both IT and OT environments, well-known and ever evolving IoT technologies are being increasingly used, leading to additional challenges on ensuring adequate security for the electric environment which used to be very isolated. This interconnection of IT/OT and increased dependence on IoT technology is leading to additional vulnerabilities and challenges on ensuring adequate security in the energy environment.

Concept #4. Risk assessment, risk mitigation, and continuous update of processes are fundamental to improving security: Based on an organization's business requirements, its security risk exposure must be determined (human safety, physical, functional, environmental, financial, societal, reputational) for all its business processes.  Risk assessment identifies the vulnerabilities of systems and procedures to deliberate or inadvertent threats, determines the potential impacts, and estimates the likelihood that the incident scenarios could actually happen. The strategy for risk mitigations must take into account operational constraints, as well as look to engineering designs and operational procedures for improving resilience, while also evaluating the costs for implementing such risk mitigation strategies and the degree to which it mitigates the risk. Risk assessment also requires that mitigation processes are reevaluated during regular security reviews or triggered by actual security incidents.

Concept #5. Cyber security standards and best practice guidelines for OT environments should be used to establish security programs and policies: Cyber security procedures should not be re-invented. Key cyber security standards and best practice guidelines have already been developed for different areas and purposes of security.  Cyber security planning should use these cyber security standards and guidelines to improve resilience and security of the OT environment, using the right standards, guidelines, and procedures for the right purposes at the right time.

Cyber Resilience as the Overall Strategy for Ensuring Business Continuity

Cyber security is far more than preventing attacks by malicious hackers. In particular, cyber security for the Smart Grid must also improve the cyber resilience of the power system by mitigating the threats from security “incidents” that affect cyber assets which could then disrupt operations. Resilience covers measures that can mitigate impacts from safety, security, and reliability incidents, not only before such incidents (identify & prevent), but also during incidents (detect & respond) and after incidents have been resolved (recover) as described in the NIST Cyber Security Framework (see Figure 1).

Mitigation of threats to resilience combines cyber security techniques (such as access control, detection of anomalous behavior, and incident logging) with organizational and engineering strategies, which allow the organization to prepare for and adapt to changing conditions and to withstand and recover rapidly from disruptions. These engineering strategies would include traditional power system reliability measures, such as redundant equipment, contingency analysis, and backup systems, but would also include strategies focused on addressing cyber asset vulnerabilities, such as planning for the loss of multiple cyber assets, isolation capability to limit cascading cyber-attacks, and even training personnel in manual operations typically performed automatically.
Mistakes are the most common “cyber incident,” so checks on data entry or control commands would be included in resilience support. Since persons with detailed knowledge of power system operations are the most dangerous attackers, additional engineering strategies may need to be deployed to mitigate this type of vulnerability, such as two-factor authentication and continuous monitoring networks for anomalous traffic. And storms can affect not only the power system but their cyber assets, so backup generators, communication networks and spare cyber equipment should be located in secure sites, yet easily accessed when needed.

Security by Design as the Most Cost Effective Approach
Designing security into cyber systems from the beginning is the most cost-effective approach to cyber security, since it minimizes risk and financial expenditures. Effective security cannot just be “patched” on to existing power system operational processes but should be an intrinsic part of system designs and configurations, operational procedures, and information technologies. Inserting security procedures and technologies afterwards is also costly because often they are “ad hoc” and require major modifications to system configurations as well as significant retraining of personnel. If designed in from the beginning, security becomes a normal part of the life cycles of power system cyber assets and operational procedures.

Nonetheless, it is well recognized that security cannot quickly be designed into existing systems, particularly since power system components may have vastly different life cycles. So, it is crucial that even for existing systems, security should be designed into operational procedures and should provide a well-defined methodology for system upgrades. Many of the benefits of Security by Design can be realized even if systems are just being upgraded or slowly replaced, since having a well-thought through security plan is critical for including security at each upgrade or replacement step.

The term “Security by Design” covers many aspects, such as system configurations, network configurations, planning procedures, and data management. For example, if the most critical systems can be located within a well-defined electronic security zone, then access to these critical systems can be designed as limited, protected, and carefully monitored (see Figure 2).  Such a design reduces “attack surfaces” that could be exploited by malicious entities or simply misused by accident.

Security by design can improve possible security mitigation actions since planning for the inevitable “successful” security breach (failure scenarios) allows training and contingency actions to be discussed, evaluated, and strategies developed to mitigate these potential breaches. For instance, the Ukrainian personnel watched the attack on their power system unfold, curiously asking whether they should call the IT department and speculating that maybe it was the IT department trying to trip the breakers. They did not have the training to comprehend that a security breach was happening right in front of their eyes.
In security by design, access control can be implemented down to the individual application and data levels, not just the system levels, which allows true end-to-end security between users and their access to data, thus limiting very precisely who can monitor and/or control what data. The same access control can also be applied to the data flows between software applications.

IT vs. OT: Differing Security Requirements in the Informational Technologies (IT) Environment and Technologies in Operational Environment (OT)

In traditional business environments, the IT department is considered the expert in all things termed "cyber security". For most corporate cyber assets, this IT expertise is well placed to understand and address the threats, and to design methods to minimize vulnerabilities and respond to attacks. In general, corporate cyber assets are mostly concerned with the confidentiality of the information contained within computer systems, so most IT security focuses on preventing access to this sensitive data.

However, in the power system operational environment (OT), deliberate cyber security incidents or inadvertent mistakes and failures of cyber assets can also have physical repercussions since power systems are "cyber-physical systems". The repercussion with the greatest consequence is safety: the deliberate or inadvertent misoperation of a cyber asset could cause harm or even death. The second most important repercussion in the reliability of the power system. Although power systems have always been built with reliability of their physical assets (breakers, transformers, power lines) as the most critical design requirement, the reliability of the supporting cyber assets must nowadays also be designed to the same degree.

Therefore, for OT, as illustrated in Figure 3, availability and data integrity (authentication, authorization, and validity) are the most critical requirements. With their experience in reliability, it is often the experts in power system operations who best understand what responses to cyber asset incidents may or may not be appropriate, and combined with IT cyber expertise, how best to utilize engineering strategies and operation of the "physical" electrical system to minimize the impacts of such cyber asset incidents.

Operational environments have some very specific security challenges. For instance, high availability of both physical and cyber assets requires engineering designs with the focus on redundancy, high reliability, high performance requirements of these assets. These security requirements may necessitate changes in network configurations and information flows, such as use of security perimeters, demilitarized zones, and firewalls. In addition, very high speed, real-time processes, involving peer-to-peer interactions, autonomous actions, time sensitivity, and other characteristics, require different security solutions to those typically used in IT.
At the same time, operational constraints must be taken into account in these designs. For instance, constraints on equipment resources (timing, bandwidth, network access) can impact the cyber security procedures and technologies that could be used. In particular, heavy encryption techniques or on-line access to certificate authorities are generally not possible for operational assets. Additionally, the timing for system maintenance and equipment updates or upgrades is constrained by power system operational requirements, such as only having short windows during the spring or fall for taking equipment out of service for such updates.

Another constraining element for applying cyber security reflects the large numbers of legacy equipment with long life cycles that cannot be easily upgraded to include cyber security techniques. In addition, given the criticality of power system operations, security should not prevent operational actions, particularly emergency actions, so "break the glass" scenarios must also be built into security procedures.

Risk Assessment, Risk Mitigation, and Lifecycle Processes

Risk assessment, risk mitigation, and lifecycle continuous update of processes are fundamental to improving security. Using business requirements (financial, brand, operation, societal), thanks to methodologies defined in international standards for OT environments, organizations can determine security risk exposure.
The strategy for risk mitigation must absolutely take into account operational constraints, and in particular to strongly integrate all communication networks. The constraints of networks often include the protection of physical asset, personal safety, as well as constraints related to the performance and architecture of these networks.
The challenge is how to apply these concepts to develop a cyber security plan for the electrical operational environment. No single process can meet all requirements but some general guidelines on how to apply these cyber security standards and guidelines to improve resilience and security of the OT environment, using the right standards, guidelines, and procedures for the right purposes at the right time. The key steps are the following:

1. Collect the high-level business and regulatory requirements that apply to the OT environment, and identification of the impacts (safety, economics, operational) if the requirements are not met.
2. Identify risks associated with each aspect of the NIST Cyber Security Framework, including inventories and management of cyber assets so we can collect information about the devices in the field and other systems, and which are associated with which business/regulatory requirements
3. Perform risk assessments on the areas of interest according to the risk assessment guidelines in the standards and based on the impacts from the business and regulatory requirements: e.g. the scope of specific projects, but also the project's interfaces with other OT and non-OT systems.
4. Identify the financial requirements for mitigating the risk and assess the balancing of risk (impact times likelihood of event) against the mitigation costs. Risk assessments can be done at different levels, e.g. for a whole substation or for parts of a substation or for one small DER site or a large PV plant. Different cyber security control standards may be of more use in different environments.
5. Establish what are acceptable risks, then identify security control solutions (procedures and/or technologies) to match unacceptable risks.
6. Apply security controls to the unacceptable risks that were identified. Some typical or selected control solutions may not be able to be applied in specific projects as initially defined, particularly for legacy systems. These controls may be provided by the utility or specified for vendors to provide. Depending upon the project needs, perform another risk assessment to determine if the controls have been applied correctly and have actually mitigated the risk adequately.
7.  Monitor the cyber assets in a completed project to ensure controls are continuing to be effective or if possible, attacks have potentially overcome the controls. Possible security events identified by this monitoring should be sent to a central CERT site.

Figure 4 illustrates the cyber resilience lifecycle which includes risk assessment, risk mitigations, and continuous feedback with reassessments of the security requirements.

Cyber Security Standards and Best Practices

Given the complexity of business processes and the wide variety of cyber assets used in the Smart Grid environment, no single cyber security existing standard can address all security requirements, security controls, resilience strategies, and technologies. Some standards and guidelines are focused on the high level organizational security requirements and more detailed recommended controls (What), while other standards focus on the technologies that can be used to supply these cyber security controls (How).

While many additional documents are available from national organizations, the key NIST, IEC, IEEE, NERC, and IETF cyber security standards and best practices are illustrated in Figure 5. 

  • The most relevant organizational standards include:
  • NIST Cyber Security Framework
  • ISO/IEC 27001 Security Audit (General)
  • ISO/IEC 27002 and ISO/IEC 27019 focused on the Smart Grid
  • NISTIR 7628 Smart Grid Security Guidelines
  • NERC Critical Infrastructure Protection (CIPs)
  • IEC 62443-2-1, 2-2, 2-3, 2-4, and 4-1 Security Programs

The functional requirement standards also identify "What" aspects of cyber assets should be evaluated through risk assessment but cover more detailed aspects of the Smart Grid.
These standards include:

  • IEC 62351-12 Resilience of the power system with DER
  • IEC 62443-3-3 System security requirements
  • IEC 62443-4-2 Security for products
  • IEEE 1686 Security for substations

The technical requirement standards provide methods and technologies on "How" these cyber assets could be made more secure and resilient. These standards include many IETF RFCs as well as the specific IEC 62351 security standards for different communication protocols, as well as role-based access control, network and system monitoring, key management, logging, and deep packet inspection. Most of these standards have (or are planning to have) conformance testing and certification standards associated with them.
The cyber security and resilience culture must start from the top of an organization. These cyber security standards and best practices can only be effective if security and resilience is seen as critical by top management and is promulgated down to all levels.

Biography

Frances Cleveland has a B. Sc. degree in Applied Physics and Electrical Engineering from Harvard University, a M. Sc. degree in Electrical Engineering and Computer Science from the University of California at Berkeley, and MBA from San Jose State University. She is President of Xanthus Consulting International and has consulted on Smart Grid information and control system projects in the electric power industry for over 36 years.
She is currently consulting to NIST as a Technical Champion for the Smart Grid Interoperability Panel (SGIP) on DER and cybersecurity, and to EPRI on the National Electric Sector Cybersecurity Organization Research (NESCOR). In the IEC, she is convenor of IEC TC57 WG15 for IEC 62351 cybersecurity standards for power system operations and is the editor for IEC TC57 WG17 for IEC 61850-7-420 information standards for DER, EV, and DA. In the IEEE, she is past chair of the IEEE PES PSCC.

BeijingSifang June 2016