I should have included a diagram on the SANS blog to illustrate the concepts a bit better. I’ll work on one shortly.
The main point behind the blog is that it takes time recognize an ongoing hack. The example I cited is actually quite optimistic. Many operators might not make the connections that a well rested senior operators would notice where I work. When OT finally does get the call, the attack has probably been underway for hours, and possibly much longer. The notorious STUXNET case had flummoxed staff for months, and perhaps a year or more before they figured out that something was actually not right about the control system.
That delay is something everyone needs to think about very carefully. The delay for response to the hack must be part of the process design. We need to think about how long it might take to recognize a compromised system and to design things in such a way that it would at least require a concerted and coordinated effort to make real mayhem. We need to design orthogonal measurement systems with different reporting methods so that we can cross check a result on a process.
And what’s the payback? The payback is not having accidents like the DC Metro disaster on the Red Line June 22, 2009. In that instance the system was dependent upon one track sensor to indicate the presence of trains (there was no Positive Train Control in service at the time). That one track sensor failed and the result was that a second train running at normal cruise rounded a corner and was completely incapable of braking quickly enough to avoid that terrible accident.
An alternative measurement/tracking scheme to confirm the reliability of the sensors would have saved lives. And it would also help to discover when the control system’s sensors aren’t all adding up. We need to design things like this in from the start, and stop relying upon complex IT tools that weren’t designed for controls systems to alert us to intrusions.