I rant a lot about I/O testing and design. This is a discussion (and more ranting) of some of the tests and practices we do following construction and later during the maintenance cycle to ensure that the controls, and alarms will work as expected.
Before I begin, one might wonder if embedded controllers or RTUs would be an improvement. The answer is no, not really. The issue of common wiring mistakes may not enter in to the picture, but there will still be problems with switches that don’t work, fuses that fail, analog values that may not be scaled correctly or properly isolated, etc. Furthermore, consider who fixes such things. On the one hand, there is an expensive, but simple system that anyone with a volt meter can diagnose. On the other hand, you have software, networks, and controller work that needs to be navigated. I’ll take the former. Keep it simple, stupid!
At the tail end of a construction project, when all the conduit and electrical gear are in place, contractors begin wiring the I/O to a cabinet. Different industries have different ways of terminating I/O. However, we have grown very fond of DIN rail and all the wonderful accessories one can snap on to it. It’s like Lego for instrumentation. Our panels will always have a grounded backplate to which we mount one or more rails on one side of the cabinet to terminate most of the I/O. (Some I/O, such as the cabinet door switch, is wired directly to the PLC) We try to avoid building big complicated cabinets with lots of I/O in it because that’s also a single point of failure.
One of those failures can be caused by overheating. Typically, it happens because the engineer was too lazy to do even a back of the envelope calculation of heat load. We had one panel that used to get pretty hot. As a temporary measure, the operators regularly left the cabinet that had a tendency to overheat open so that a fan could blow air in there. This soon became habit and it continued that way in to late autumn. Naturally it didn’t take long for some field mice to quietly set up shop in there. Then, before the winter snows had begun, we had a lot of inexplicable errors. The culprit? Mice were urinating on the PLC I/O rack, corroding contacts between the rack connectors and the I/O cards. The initial failures were very random. It took a while to figure out what was going on. It got expensive and very annoying when we finally discovered the root cause.
Thus, my rules of thumb for I/O cabinets:
Rule #1: Keep cabinets small. The more things stuffed in to one place, the bigger a failure can be. Try and keep power sources diverse too.
Rule #2: Keep a thermal budget for each cabinet and leave space to install a heat exchanger or HVAC of some sort just in case something is added that makes more heat than expected.
Rule #3 for cabinets: Condensation tends to flow through conduit. We have a rule that cabinets should be penetrated from the bottom, and the sides as close to the bottom as possible, but not from the top. This is because we’ve had issues where water condensed and flowed through a conduit and on to the equipment from the top of the panel. We also leave drain holes or vents on the bottom of the cabinet so that the condensation has a place to go. We warn the electricians, inspectors, and project managers about this well ahead of the panel delivery. We even have attached bright orange warnings taped to the top of the panel to ensure that people understand that we will reject their work if they screw this up.
At the I/O rail we install fuses, shield grounds, analog loop resistors, surge suppressors, loop power supplies, breakers, and all that miscellaneous stuff. The goal is to have a designated test zone where we can isolate the automation from the controls or instrumentation to confirm that it’s working correctly. Being able to break a loop right there is very useful for deciding what’s actually broken: the automation equipment, or the remote instrument?
Analog loop resistors? YES! We have discovered that current loop sensors built in to PLC I/O modules tends to be easier to destroy from a lightning strike than a 1/2 watt precision wire-wound resistor. The failure modes of internal current sensors can be confusing too. We’ve seen such modules continue to work but read out of calibration. An external resistor generally doesn’t fail that way; and if it does, it is easy to check and replace. We scale our inputs for 1-5 volts across a 250 ohm resistor. We also use isolated analog inputs. We do not ground our 4-20 mA current loops in our panel.
In our business, we typically include a DC UPS so that we can properly detect and report power outages. We noticed them when Siemens came out with their DC UPS products back about 16 years ago. It was an instant MUST HAVE. The beauty of this UPS is that it periodically tests the battery charge and discharge curves so that when the batteries are no longer performing to specification, we get an alarm. This was a major problem for us in the field from the earlier generation of equipment because we had been replacing the batteries according to age while neglecting the environment they were working in. A lot of perfectly serviceable batteries were replaced, while others had open cells so that they couldn’t even power the equipment long enough to report the power was out. We saved huge amounts of money by first knowing that the batteries were good; and second, knowing precisely when a battery was no longer fit for service.
Since that time, others have gotten in to the business of DC UPS gear. There are many to choose from and the features are all interesting. Choose one that works best for you.
Above all, do not use office-grade UPS gear! It is carefully engineered to be just barely adequate for typical office purposes. The inverter in those devices can take hundreds of milliseconds to light up. It is not fit for service in any industrial environment.
In the field, we use lots of fiber-optic gear. When a data signal leaves a panel, it usually runs on fiber now. However, it is still not unusual to see RS-422/RS-485 connections in older panels. The equipment typically has 1500 volts of galvanic isolation from 485/422 wiring. Generally, unless there is a reason to replace that wiring, we leave it in service. It is not a significant cause of failures.
Regarding networks, we avoid large central switches. This is for two reasons. First, while it is easier and cheaper to do home runs back to a core switch in an office, it is not cheaper or easier to do it in an industrial setting. We use small remotely monitored switches in each cabinet with trunks to other switches where they can be aggregated. Sometimes we even use proprietary rings so that we can take a physical link out of service for testing without interrupting the process. Remember, Spanning Tree Protocol is fine if a few seconds won’t be noticed, but most I/O systems would fault in 200 milliseconds or less.
Regarding the optical fiber: We have an extensive testing regiment because we’ve been burned before. Many contract electricians moan and groan about this too. Most quiet down when we explain that this is not just for our protection, BUT THEIRS. When a spool of fiber is delivered, we set up temporary connectors (if it doesn’t already have them) and test it to ensure the entire length is continuous and the loss is within specification. We have seen instances where hidden damage during shipping made the cable unusable. Once it is pulled in to the conduit banks it almost takes an act of Congress to get it removed. And if it was damaged or incorrectly specified on delivery, we need to reject it right away.
At the end of the day, we are paying for a fiber connection that works, not some substandard length of God-knows-what that won’t work even at 1/10 of the rated speed, or in the environment it was pulled in. Then, when it is in place, we have it terminated and test it for power loss and length. We calculate a loss budget figuring in all connectors. We expect to see a system that meets those calculated specifications. Contractors who work with us often bitch and moan –but they’re not the people who will have to live with a failed fiber system at 2 AM. We tell them to read the damned contract next time.
When testing the I/O, we have a first step called Open-Loop Testing. This is where everything is tested right to the I/O rail. It is done without the PLC and other ancillary gear installed. Often breakers will pop and fuses will blow, so while things are that unstable, we don’t want to damage our equipment. The Open Loop Testing proves the wire is labeled properly, hooked up correctly, and that we have correct continuity to the right places (make sure you get the correct side of a Form C contact).
When it is done, we then close the breakers and fuses to the controller and conduct closed loop testing all the way through the controller (with the program halted). That way, we don’t have to wonder what the PLC will do in reaction to various stimuli we are testing, when we don’t know whether the I/O works properly.
When we finally close things through to the controller, there may be grounding issues, particularly if the remote is 4-20 mA analog loop. We have encountered some vendors who saw nothing wrong with grounding the negative side of a 4-20 mA input in to their product, making it impossible for us to wire that control loop anywhere else. The last thing I needed was another damned loop isolation gadget that can add more complexity to the things we have to check when things stop working. Needless to say, we’re not so enthusiastic about this vendor any more.
Once all the I/O is properly debugged and documented, only then will we download the program and test the automation.
Given the late stage of the project, we get a lot of heat from the project managers and construction supervisors who just want the job to be over. These steps are more tedious than most project managers and contractors realize. But I don’t want to get stuck with a bunch of untested I/O that “meets spec” but doesn’t work. There is usually a lot of finger pointing and groans over this process, but it is there for a reason. Just pray that your inspectors and project managers understand all this and know why we demand it. NO SHORTCUTS! It will cost everyone more in the long run. We learned to do this because we have had managers and inspectors try to avoid it. It always costs more time and money that it would have saved had everyone methodically tested everything properly the first time.
Okay, we have a working control system. But for how long? Periodically, it is a very good idea to test all the I/O to make sure it is working correctly.
Some techs will just take a jumper and light up various digital inputs. This is not a good idea. All it proves is that each digital input works. It does not prove out the wiring, fuses, surge protectors, or the instrumentation. Things do happen. Calibration factors change. Instruments are replaced with ones having different ranges. Thus it is important to check the field documentation against the documentation in the Master. Confirm that the documentation matches up with the field devices and functionality
This brings me to another point: Where do you set your scale factors and the like? One very good place to do it is in the RTU. Why? Because you only have to change one thing in one place. You don’t have to worry about version control and all that. Change things in the field, adjust the scale factors in the field, and then leave it. This is especially the case for change of custody meters. The changes from the site will be immediately reflected in all the Master stations that read that RTU, not just those who got the updates.
This is also a good reason why you should choose a SCADA protocol that can handle floating point numbers. Calculating flow totalization at the site is much nicer and more resilient than trying to do it at the master site.
Finally, one should manually trip switches, breakers, hit E-Stops and the like to ensure that no relays have gotten stuck, that no fuses have blown, that the logic and events are correctly configured, and so on.
If this sounds like a lot of work, consider what it is like when you don’t check any of these things. You’ll never know if that alarm will ever work properly or not. It may not have tripped in years, will it trip or read properly tomorrow? Nobody wants to hear excuses when a chain of poorly checked SCADA I/O leads to a terrible disaster. If it mattered enough to install, it should matter enough to check.