It is going to happen sooner or later. Someone raises the question: Have we been hacked?
It seems like a simple question. However, before we can ever get to the “it must be a hack” phase, we need to eliminate all the other likely failure modes. Some of them can be very subtle and difficult to diagnose. It can be environmental. It can be electrical, especially grounding problems. It can be an actual problem in the process design itself. It can be a bug in firmware or application software of the control system. There is a significant list of possibilities that must be eliminated before assuming that there must be some sort of hack going on.
But let’s assume that we’re now pretty certain that there is something going on that can’t be explained by routine failure. Who do you call in and what do they need from you to get started? Do you have documentation, perhaps a CSET or a recent risk assessment? Do you have a list of what is supposed to be on the network? Do you have backups for everything, including the PLC? Do you have version numbers for all software and firmware? If you don’t, Get it together NOW.
Meanwhile, are there standard operating procedures that break apart the plant control system so that it can be safely checked and worked on while maintaining the continuity of the process? Your system is clearly not playing nice with itself. Can you run things at a lower grade of automation? Do you have PID controllers and manual setpoints that you can access? Do you have local Operator Interface Terminals that can be used to record alarms and display statuses without needing a tag server?
How about network diagrams? Do you have the equipment documented? Do you have VLAN lists? Trunked traffic lists? Do you have physical layouts for each part of the network?
Finally, who are you calling and when should they be expected to be on site? Have they been familiarized with your plant, your application, and your staff? If you haven’t introduced them, do it NOW.
We can’t know everything we’re going to need about a breach of security. But we can save a lot of time by bringing people up to speed before it happens so that the recovery isn’t so expensive.