Diagramming ICS Security

In a blog post, Sarah Fluchs made a very important point: We have diagrams and abstractions for virtually everything in an industrial control system. But for some reason, we don’t do this for industrial control system network security. I think she has has pointed her finger on the pulse of the problem with industrial control system security.

To properly discuss why diagrams were not used we need to look back at the current models of security. Most security designs are loosely modeled after the Purdue Enterprise Reference Architecture (commonly referred to as the “Purdue Model”).

The Purdue Model was not designed to be a security architecture. It was a model designed to preserve real-time performance at various parts of a process. The methods involved segmenting the various elements with switches or routers so as to limit traffic.

Keep in mind that networks in the early 1990s were expected to have only about 3 MPBS throughput (10 MBPS on a bus with 30% utilization expected due to CSMA/CD). Many IT protocols at the time were flat, with significant amounts of broadcast traffic. Common office traffic could easily overwhelm a control system. This lead to mandates for segmenting networks to keep the office traffic at bay.

If one replaces those switches or routers with firewalls, the Purdue Model provides an introductory starting point for security. But that’s all the Purdue Model ever was. Anyone trying to sell the the Purdue Model as something more than real time performance is making stuff up.

Ms. Fluchs is not the first person to have noticed that if the security systems detect something, we don’t have many ways to act upon it quickly. But she may be the first to have identified some very important missing elements that most plant staff would expect: namely, the diagrams in context with the process. Without the diagrams, we can’t easily automate anything. For example, with a diesel generator, if we have an engine fire, there are alarms. Usually the automation will shut off all fuel to the engine to keep things from getting worse. This is usually found in the control wiring diagrams.

The assumption most people had is that if the firewall between Purdue Model levels blocked traffic, that something reasonable would happen. This assumption further presumes that all controls are digital and that the authors of the controller thought of everything.

While engineers are very good at what they do, they’re not perfect. Engineers learn from their mistakes and the mistakes of others. However, an attacker is usually quite creative at lining up unlikely permutations of events that will destroy the infrastructure and possibly kill people. The assumptions that most engineers make when designing controls are typically from random failure modes, not a malicious attack.

For example, it is routine to provide excess capacity for water pumping stations because when a pipeline breaks we need to keep the water distribution system pressure up. But what if the water distribution system were fine? What would happen if the pressure went too high? Something would probably break.

The answer an engineer would say is that if the pressure goes too high, shut off a pump! It’s the same issue as what would happen if you crossed over the double yellow line on a highway. Go back to the side where you belong (depending on which country you’re driving).

However, an attacker doesn’t think that way. The attacker wants you to be on the wrong side of the highway. The attacker wants to push pressures in the pipeline too high. The attacker wants the reactor vessel to be overheated.

So how do we stop this? This is where most of the OT Security expertise has thinned out. Most are not interested in dealing with a design, but we must.

What we need can start as an overlay to the Process and Instrumentation (P&I) diagrams. We need to identify the cells of automation and show what those controllers are supposed to connect to. Historically, most PLC I/O has been designed based upon ease of getting the I/O in to a local marshaling cabinet. Considerations regarding I/O function were secondary.

We need security expertise with the engineering expertise to develop true zones of functionality. This might mean that a feed valve from a tank which is actually part of a tank farm system needs to be shut down directly from the process, without going through other controllers. A secondary set of valve status and control contacts coming from another I/O panel might be appropriate. That way, if the tank farm PLC were to fail, it would still be possible to open or close a valve.

With each of those connections we need to identify what will happen to the process if that connection is broken. If automation is required for quick action or for safety, we need to identify those functions. This might also bump an installation of a PLC from not critical to a SIL 1 application.

To assist with the identification of what an appropriate zone/conduit model should look like, we need to generate a table of control permutations for each process state, and the various control input or output failure modes.

The Purdue Model says nothing about this feature because it is not security related. It is focused on real-time performance. Once the most dangerous permutations have been identified and reviewed, we can then draw an outline in the P&I diagrams to identify the zones and the conduits of security. It may even be possible to design the I/O so that, with control power interlocks, it physically cannot be set in those deadly combinations.

As for the network itself, the P&I drawings do have the notion of “transmitter” with media built in to them. They have many transmission connection lines. It would be trivial to add a few firewall, switch and router symbols so that we’d at least know approximately where the lower layer field components are, and what they connect to.

Having analyzed a process to determine zone and conduits, we would then draw a topological foundation to indicate what those switches, routers, and firewalls do and what functionality they can block. This could include software defined networks; though to be honest, I have a hard time imagining why anyone would need such features at the process, HMI, or even the ERP levels.

The design processes that Ms. Fluchs proposed are good as far as they go. Like all design processes, they’re conceptual and they’re never going to be more than that. It is a useful exercise to ensure that you haven’t overlooked something monumental. However, 90% of this is the following:

  • Identification of zones and conduits (This should ideally be done during the process design)
  • Discuss failure modes with a permutation table
  • Identify defensive measures against dangerous permutations
  • Drawing Zones, Conduits and network hardware on the P&I diagrams
  • Drawing network diagrams with the zones and conduits identified on the topology
  • Writing Standard Operating Procedures around various expected attack scenarios

In all, I commend Sarah Fluchs for noticing that most of the OT security efforts are a virtual castle in the sky with very little foundation in the actual process or the engineering. This discussion is long overdue and it needs to continue.

http://www.infracritical.com

With more than 30 years experience at a large water/wastewater utility and extensive experience with control systems, substation design, SCADA, RF and microwave telecommunications, and work with various standards committees, Jake still feels like one of those proverbial blind men discovering an elephant. Jake is a Registered Professional Engineer of Control Systems. Note that this blog is Jake's opinion ONLY. No Employers, past or present were ever consulted with regard to these posts. These are Jake's notions. Don't blame anyone else for them.