When I see most OT staff discuss ICS security, they usually begin with some networking gewgaws and tweaks. This sort of stuff is interesting the first few times going through this exercise. However, it doesn’t take long to realize that network security alone is a multi-headed hydra of a problem. The more we try and fight this problem, the harder it is to manage the network, and keep it available for the control system. There is a point of diminishing returns where the network becomes fragile and availability suffers from all the security measures. These measures get centralized and introduce new points of failure and complexity to a process.
Remember, the goal is to maintain availability and rapid recovery. Network security with domain controllers of everything can themselves become a significant failure mode and an impediment to rapid recovery.
There is another way to detect problems in an ICS process. It’s about the process.
Look at the Purdue Model of Control Systems. Notice that at Layers 0, 1, and 2 have features that allow you to segment within the layer as well as upward toward layer 3.
Critical instruments that affect each other should be attached to different processors. For example: At a waste-water treatment plant, the influent pumping station has the run statuses of several pumps. There is also a wet-well level flow indication, an influent flow-meter and a flow meter toward the rest of the plant.
The wet-well should go up and down with the pump statuses. The influent and plant flow meters should total up reasonably well. The wet well level should rise at a certain rate when the pumps are off. The HIHI float switch in the wet-well should not trip and start a pump. The LOLO wetwell float should not switch trip a pump offline, either. So there is a certain behavior that you should see in that part of the process. The flow that leads to the Aeration basins should also total up with the flow leaving the aeration basins over a certain time delay.
Why do things that way? Do you want to be a fifth wheel in the operations room? Why run an ICS SOC 24/7 along with the operators to ensure that everything is working as it is supposed to? It is better to have the operators focused on how the system is supposed to run. If they start reporting anomalies, OT Security or a plant Superintendent can instruct them where to break network connections apart. Now the OT security problem has become an on-call effort instead of a constant 24/7 operation.
Most people specifying OT security don’t think about the fact that they’re adding some very expensive people to the front line operations. We do not automate so that we can add additional people to the plant. We automate so that we won’t need an expensive security officer looking over our shoulders to keep us safe.
Use what you have. Present comprehensible alarms to the operator. Cross check the operation, and then when something weird doesn’t add up, be ready to come in and do something useful.