Automated Vulnerability Management for the Power Grid

by Andreas Klien, OMICRIN electronics, GmbH, Austria

This article addresses the challenges of vulnerability management in the power grid, specifically the inadequate detection of security vulnerabilities in protection and control devices in power plants, substations, and control centers. It discusses the difficulties in correlating security advisories with PAC devices and the effort required to implement effective vulnerability management. The article presents an automated solution based on leveraging the capabilities of IEC 61850 and the Common Security Advisory Framework (CSAF). The proposed solution uses a vulnerability database developed specifically for PAC systems that incorporates expert knowledge about these devices and systems.

The Information Security Officer asks the responsible PAC engineer: “Are we affected by the critical vulnerability in firmware version 7.5 of the ACME A63xx family of protection relays?”

After two days of analysis, the PAC engineer replies: “Well, it depends.”

Cyber Assets

In the power grid context, assets are typically primary equipment like transformers, switchgear, or whole substations. In the remainder of this article, the term “asset” will refer to cyber assets. Also, for cyber assets, there are multiple definitions possible on what is covered by it. In short, cyber assets could be summarized, everything that needs to be protected against cyber risks. Cyber assets include hardware, software, and other elements that facilitate the communication, processing, storage, and management of digital information.

Cyber assets can be categorized into these types:

Hardware: Physical devices such as protection, automation, and control devices, servers, routers, switches, firewalls, computers, smartphones, and IoT devices. In short, if it has a firmware or software running on it, it is a cyber asset
Software: Software components that run on hardware devices, such as applications and operating systems. An example is also energy management systems, distributed over multiple servers
Data: Information stored, processed, or transmitted by an organization, including both sensitive and non-sensitive data

All of these cyber assets need to be tracked and monitored to be protected against cyber risks. Therefore, creating and maintaining a cyber asset inventory is critical for effective cybersecurity management, risk assessment, and incident response. It is the only way how a clear understanding of assets, their locations, and their configurations can be upheld.

Here are a few steps for creating a cyber asset inventory:

Identify: Identify all cyber assets within the organization, including hardware, software, data, and services
Classification: Categorize assets based on their criticality, sensitivity, and function within the organization to prioritize security efforts and effectively allocate resources
Documentation: Record critical information about each asset, such as its owner, location, version, configuration, and associated vulnerabilities
Monitoring: Continuously monitor assets for changes, including software updates, hardware upgrades, new vulnerabilities, and the addition or removal of assets from the network
Review and update: Periodically review and update the inventory to ensure it remains accurate and current

A well-maintained cyber asset inventory enables organizations to proactively manage risk, ensure regulatory compliance, and effectively respond to security incidents. As an OT engineer, it is essential to understand and manage the cyber assets involved in the operation of the power grid. But what exactly are such security vulnerabilities and how can they be identified?

From Vulnerabilities to Security Advisories

Security vulnerabilities are bugs that an attacker can exploit to achieve their goal. When looking at a product, vulnerabilities can be found either in the source code of the product itself, or in 3rd party components used in the product. Today, vendors use hundreds, if not thousands of 3rd party libraries in their products. Therefore, it is much more common to find vulnerabilities in a 3rd party component than in the product itself. But not all bugs in 3rd party components lead to bugs in the device that uses them. When the device manufacturer learns of the vulnerability, they must first assess the impact. The same vulnerability in a 3rd party component may lead to a severe vulnerability in product A, a moderate severity vulnerability in product B, and no vulnerability at all in product C. The process is usually as follows: The manufacturer maintains a Software Bill of Materials (SBOM) for their product. This is a list of all the external libraries and components used and their version numbers. All of these components must be continuously monitored for vulnerabilities. There are databases such as CVE (Common Vulnerabilities and Exposures, https://cve.mitre.org) or the NVD (National Vulnerability Database, https://nvd.nist.gov), that contain most of the vulnerabilities discovered in software products and libraries.

We use automated solutions to maintain our SBOM and scan our code base for known vulnerabilities. When a new vulnerability is found in one of our 3rd party components, our developers assess the impact on our solutions. They consider how attackers might use this bug for their purposes. This assessment uses criteria such as the ease of exploitation, the impact on the confidentiality, integrity, and availability of our product, and the likelihood of actual exploitation. We also consider is the impact on our customers, such as financial loss and reputational/brand damage. Because our products are used in the power grid, we must consider the potential impact on the power supply and the risk to life and limb. If either of these is possible as a direct result of exploiting a vulnerability, it is considered critical and immediate action must be taken. For example, if a vulnerability allows an attacker to gain TCP/IP network access, then the attacker could use this to send MMS or IEC-104 control commands to trip breakers and cause a power impact. If the vulnerability allows elevated privileges, and thus raw socket network access, the attacker could send GOOSE messages to disrupt the interlock.

This could cause damage to primary equipment in substations and endanger the lives of personnel working in the switchyard. Such scenarios are unlikely, but possible, and must be considered. This process must be ongoing as new vulnerabilities are discovered and new updates to external components are released. The goal is to minimize the risk to our customers and help them secure their systems.

After a vulnerability is discovered in the SBOM, it is critical to create a security advisory. A security advisory is a formal document that discloses the vulnerability and provides steps to mitigate the threat. The advisory should include the severity level, the impact to the product, and the steps required to resolve the vulnerability. It is important to release the advisory to the product users in a timely and transparent manner. This is usually done while a patch or a fix for the vulnerability is available. Properly managing security advisories is critical to protecting user data and devices from attackers.

Vulnerability Management Process

However, identifying vulnerabilities is not enough to protect your systems. This is just one step in a cyber vulnerability assessment. It helps OT engineers identify and evaluate security risks in their organization’s ICT infrastructure. Such a vulnerability management process should include the following steps:

Define Scope: Determine the systems, networks, applications, or ICT infrastructure to be assessed and the objectives of the assessment
Identify Assets: Create an inventory of hardware, software, network infrastructure, and data to understand what is at risk
Collect Data: Gather information about the ICT environment, including network topology, system configurations, software versions, patch levels, and access controls
Identify Vulnerabilities: Analyze the data to identify potential vulnerabilities by comparing software and firmware versions and system configurations to known vulnerabilities
Analyze Vulnerabilities: Evaluate identified vulnerabilities based on their potential impact, likelihood of exploitation, and the effectiveness of existing security controls
Report Findings: Document the results of the assessment, including identified vulnerabilities and recommended remediation steps
Remediate: Address vulnerabilities by implementing appropriate security controls, such as patching software or improving access controls
Review and Improve: Periodically review and update the assessment process to ensure it remains effective

A cyber vulnerability assessment is essential to protect critical cyber assets like protection and control devices, and to minimize the risk of security breaches. By regularly assessing and addressing vulnerabilities, it is possible to know which patches need to be installed and which can be deferred.

Why are PAC Devices Rarely Patched?

A typical utility has thousands of protection, automation and control devices installed, all of which are used to perform highly critical tasks. However, security vulnerabilities in PAC devices are often poorly identified, and operators are often unaware of the vulnerabilities affecting their devices. “Why are there still protection and control devices out there with firmware that is years old, with vulnerabilities that are years old, some with known exploits?” Just as you regularly update your operating system and browser, shouldn’t you be able to do the same with devices that are responsible for critical infrastructure? These statements are often heard from new colleagues with IT backgrounds who are unaware of the challenges and effort involved in patching substation automation devices.

Reason 1: Shutdowns. It is well known that firmware updates to cyber-physical systems require shutting down the controlled physical process. In manufacturing facilities, finding a time when the system is idle long enough to install new software and perform a test run is a challenge. It is even worse in the power grid, where it can take several months for the desired shutdown to be scheduled by the grid planning department. In the meantime, more security updates would become available. When the shutdown is scheduled, the maintenance crew can perform firmware updates on one feeder. A complete shutdown of all feeders is rarely possible. Updating all devices in a system requires multiple visits, and mixed operation with different firmware versions is inevitable. As a result, it is common for a utility to have many firmware versions of the same device type in use, increasing the complexity of assessing whether one is affected by a vulnerability in a particular firmware version.

Reason 2: Testability. Many PAC engineers have told me stories about how new firmware versions behaved differently than the previous one, and how bugs crept into the most critical protection functions even in minor updates. Before a utility deploys a new version of firmware for a critical protection relay or control device, it must be extensively tested. Some utilities invest in specially equipped lab setups that replicate their substation design as closely as possible. The problem, however, is that each substation has a slightly different design and there is usually little commonality and standardization. Therefore, it is also necessary to test in the field after updates have been applied. The challenge is to estimate how in-depth the testing needs to be. Checking all protection functions, control sequences, switching logic and communication with the control center is very time-consuming and requires automated test plans to be created beforehand (recommissioning). As a result, there is often an uncomfortable feeling after a firmware update that it has not been sufficiently tested.

Risk Management vs. Blind Patching

Therefore, not only should you carefully consider whether you really need to apply a patch because of the immense effort involved. The risk of applying the patch may be greater than the risk of not applying it. To manage this risk and make an informed decision about whether or not to deploy a patch, a vulnerability management process must be established. The goal of a vulnerability management process for the power grid is not just to have a reminder for each patch that needs to be installed. The goal is to have 100% confidence that I know what vulnerabilities I am exposed to, and to be able to decide if that risk is acceptable or if mitigations need to be put in place.

To set up a vulnerability management process, I first need to obtain the security advisories from my device manufacturers. These are usually available on the customer portal of the vendor’s website and are also emailed. The most active PAC technology vendors release about 200 security advisories per year, with each advisory covering a handful of different vulnerabilities (CVEs) and each advisory covering about a dozen device types. The ICS-CERT of the U.S. Cybersecurity & Infrastructure Security Agency (CISA) also distributes these advisories, providing a common source for many vendors.

How do I know if I am affected by a security vulnerability? I need to compare the security advisory with my asset inventory. Therefore, most security standards and frameworks, such as IEC 62443-3-3 SR7.8, require keeping a current list of installed components and their properties. Similar wording can also be found in ISO 270001, Appendix A.8.1.1 and NIST SP 800-53.

Unfortunately, knowing the exact make and model of your device and its firmware version is not enough. Often the vulnerability is not in the firmware of the device itself, but in the separate firmware running on the expansion modules inside the device, which has its own version number (Figure 1). A common example of this is the Ethernet communication module used in different types of devices from the same manufacturer, and this card may even have different firmware variants. However, the asset inventory typically contains only the device type and the firmware version used, not the firmware version numbers used on the device’s modules. Worse, in many cases, there is no global asset inventory, and the information is scattered in various project files and spreadsheets and is not up to date.

Creating an Asset Inventory

Ideally, of course, this asset inventory is created automatically. This is where grid automation engineers have an advantage over industrial automation engineers: IEC 61850 provides several machine-readable ways to obtain asset information. At the end of the IEC 61850 engineering process, regardless of if the process was top-down or bottom-up, there is an SCD file documenting the instantiated data models of all IEDs. Meanwhile almost all engineering tool vendors are putting information about the type and model of each IED in an interoperable way into the SCD file. The name plate data in the high-level logical nodes (LPHD, LLN0) hold the model information, serial number, hardware revision, and software revision. When properly populated, not only the configuration files will contain these data for offline use. The actual data can be interrogated from the IEDs, so a non-documented firmware upgrade would be reflected in the software revision obtained online through MMS. These data, together with other parameters of the communication infrastructure and vulnerability databases can then be used to determine which vulnerabilities are relevant for a substation.

Thus, cybersecurity tools can read the System Configuration Description (SCD) file and extract the nameplate information. Similarly, the nameplate can be actively queried via the IEC 61850 MMS protocol.

Our solution provides these options for creating a global asset inventory:

Passively, via network traffic
Actively query the nameplate via IEC-61850-MMS protocol
Import of project file in (IEC 61850 SCD)
Import of plant documentation in spreadsheet format

Figure 2 shows an example of an automatically generated inventory list with detailed information about each asset.

Vulnerability Identification

What remains is the time-consuming process of comparing the manufacturer’s security advisories with the asset inventory. The question “Are we affected?” still requires experts who are familiar with these devices and their composition, because it depends…

To provide an automated solution to this problem, without having to interview a PAC engineer each time, we created a vulnerability database that also contains meta-information about device types, modules, and their firmware versions. In other words, we encoded PAC expertise into the database (Figure 3). For example, based on the product ordering code of certain vendors, our solution can figure out the module configuration of the device and use this information for the vulnerability assessment. This product ordering code can be found in the nameplate information that we retrieve via the IEC 61850 MMS self-description feature.

The process of building this vulnerability database is described in Figure 3. Our web crawlers scan the security advisory pages of protection and control device vendors daily for new disclosures or changes to existing publications. All vulnerability texts and PDFs and their changes are captured in an archive, analyzed, enriched with device meta-information, and then added to our power grid vulnerability database. Before these database updates are delivered to our customers, all additions and changes are reviewed by our experts.

This process has been greatly improved by the OASIS Consortium’s Common Security Advisory Framework (CSAF). The CSAF format allows vendors to publish security advisories in a machine-readable format. The benefit is that information such as which products and firmware versions are affected is described in a standardized format. Several PAC technology vendors are already publishing advisories in this format, including Siemens, Schneider Electric, and Hitachi Energy. Information about the CSAF standards can be found at www.csaf.io. Unfortunately, not all of the vendors publish their security advisories in this format. Some vendors have only published their most recent advisories in this format, and older advisories are only available in PDF format. Since we also use CSAF as our internal format for describing all vulnerabilities in our vulnerability database, our security analysts have created CSAF files for vendors that don’t provide CSAF. Over time, our analysts have created hundreds of CSAF descriptions of vulnerabilities in PAC devices. Only with this level of detail and metadata is it ultimately possible to automatically display only those vulnerabilities that apply to the device and its composition.

Vulnerability management in the power grid is a complex and challenging task. Automating these processes through the use of machine-readable descriptions such as SCL and CSAF, together with a purpose-built vulnerability database, offers a promising solution to effectively assist utilities in identifying their vulnerabilities. This will finally make it possible to perform risk management and to decide with confidence whether or not to apply a patch.

Biography:

Andreas Klienreceived the M.Sc. degree in Computer Engineering at the Vienna University of Technology. He joined OMICRON in 2005, working with IEC 61850 since then. Since 2018, Andreas has been responsible for the Power Utility Communication business of OMICRON. His fields of experience are substation communication, SCADA, and power systems cyber security. As a member of the WG10 in TC57 of the IEC he is participating in the development of the IEC 61850 standard series.

You may also like