SCADA Over WAN

People who build SCADA systems over local and wide area networks seem to have this notion that bandwidth and latency are not limiting factors, and security is a problem for someone else. Oh, if only that were true.

The first thing everyone should do when working with a new RTU is to disable the services that aren’t being used. Most RTUs have a web server. Some have an FTP server. Some have the ability to respond to SNMP and send traps to a server. Some may even confess their sins to a Syslog server.

Thus the first rule: If the techs or operators really don’t need it or know what it is, turn it off! The web server is usually a really cute selling point for an RTU. However when people actually put this thing in to service, very few actually use it for more than occasional diagnostics. The truth is that most of those diagnostics are things that one would already know or figure out in other ways. Unfortunately, there are people who know a hell of a lot about web servers: Hackers. That web feature could be used as a launching pad for other attacks. Unless there is a well defined purpose for a web server disable it.

Is anyone using the SNMP services on that RTU? Really? I mean, it’s already an RTU and it is monitored 24/7. Why does anything need to see this thing with SNMP? Why does the IT department’s Network Management System (NMS) need to see the RTU? If the operators know that the RTU is working properly, shouldn’t that be enough?

Some RTUs have features that include a TFTP or FTP server. This is typically used to upload new configurations or new firmware. When placing the RTU in to service, disable them. Leaving this feature enabled is an open invitation to hackers to do terrible things.

If there are other SCADA protocols that nothing uses, turn those off too. For example, if the remote is configured to run ModbusTCP, and the unit also plays EthernetIP, turn off the EthernetIP. Do not leave unconfigured protocols out there to be exploited.

On the other hand, does the RTU need the time? If so, plan on enabling access to a time source from somewhere reliable. The Simple Network Time Protocol (SNTP) protocol is okay, Network Time Protocol (NTP) is pretty good, but IEEE Std. 1588 (PTP) is better. Make sure there is a reliable time server on that network with a fairly stable latency. If the time comes over potentially hostile networks, such as an internet connection, consider using authentication with it. NTP has those features. However, NTP takes around a day or so to settle down to a reasonable opinion of what the latency, drift, and accuracy of each clock has. So plan on periods of hours where the time may be off by more than 50 milliseconds. Do note that in North America an AC cycle takes a bit less than 17 milliseconds, so determining the sequence of events across multiple RTUs may be an issue with NTP.

What is the Maximum Transmission Unit size (MTU) all the way back to the SCADA Master? Don’t assume that it is 1500 octets. Use a ping command with the NO_FRAGMENT option to confirm that this is really what it is. Furthermore, if the link runs on a VPN over unknown infrastructure, even though the MTU in and out of the VPN is consistent, the latency may not remain stable.

As an aside: the Internet standard MTU can be as low as 576 octets long (see RFC 791).

Thus, my second rule, just as I pointed out for Serial SCADA: Unless you know the MTU is not going to change, KEEP MESSAGES SHORT!.

If the system is running a SCADA protocol over an IP based radio, be aware that these are usually half duplex devices. The turn around latency for a radio is not instantaneous. If TCP is used, there may be an overhead of the Syn, Syn-Ack, and Ack messages before the traffic can be sent. Many choose to avoid that overhead and rely upon UDP instead.

The reason I am this concerned about MTU is because many SCADA systems depend upon keeping the timing and the latency consistent. It is also critical in many protocols that messages not be split up or reordered. One thing that TCP does is to index the packets so that if they’re transmitted out of order, they can be reassembled in order. That’s what makes a TCP link somewhat more agnostic over what the MTU size is. It is generally not a bad approach as long as the data link is reliable. Radio links are not like that. They usually do have a significant failure rate. So UDP  with a known MTU tends to be favored in that application.

Rule number three: Pay attention to network architecture. This is usually why it it may be a good idea to install a master at a relatively stable topological center node of a network where latency is short, and links reliable. Then in those cases it is possible to run UDP, keep traffic levels low, and get consistent data.

If that’s not practical (for example, using multiple ISP services with VPN circuits to carry SCADA) then grouping the networks according to infrastructure makes a big difference. This is where using TCP/IP links tends to make more sense. In the previous example, using separate master software instances to poll through each ISP infrastructure node is a good idea.

Regarding timeouts, the best thing one can do is to run an extended ping routine with typical packet sizes that you might see from the remote. See what the minimum, maximum, and average ping times are after a day or so of running. This should provide some guidance as to what a reasonable timeout time might look like.

Turn off auto-config features on your switch and on your RTU. The auto-negotiation speed features of most switches can take most of a minute to resolve. If you’re recovering from a power outage, and the switch happened to go down, you could be waiting a long time to get telemetry on what happened when.

As a side note: the RTU is meant to go online and stay there. This is good reason to use static addressing. While most vendors do have DHCP or BOOTP compatible TCP/IP stacks, the use of such features will only increase reliance on the server to set the address. Furthermore, it takes time to get on the switch and load a DHCP configuration. That is a time period where the RTU can not report anything.

By the way, this is an excellent reason to prefer event oriented protocols over real time protocols. In the former case, the delays of bringing up the switch won’t matter. In the latter case, by the time everything has stabilized and re-established connections, there won’t be any forensics to explain why things went down in the first place.

It might be a good idea to show this blog to the IT network people you’re working with. They have habits that have been carefully honed for office work that are toxic in control systems applications. If they do not understand what this blog is about, it is time to find someone else to do the network configuration.

Like the previous blogs, this one is an overview of many of the corners I have stubbed my toes on in the past. Clearly every situation is different and you should consult with people who understand these issues.

http://www.infracritical.com

With more than 30 years experience at a large water/wastewater utility and extensive experience with control systems, substation design, SCADA, RF and microwave telecommunications, and work with various standards committees, Jake still feels like one of those proverbial blind men discovering an elephant. Jake is a Registered Professional Engineer of Control Systems. Note that this blog is Jake's opinion ONLY. No Employers, past or present were ever consulted with regard to these posts. These are Jake's notions. Don't blame anyone else for them.