“You can’t handle the truth!” –Col. Jessup played by Jack Nicholson in the 1992 movie “A Few Good Men“
Many think that if they could just get closer to the data that they will somehow discover something that will save the company millions and that will more than justify all the expense and hassle. I have been a technician, a software analyst, a professional engineer, and a cybersecurity specialist over my career spanning 40 years. I know what this data is and what it looks like, and it is not what most people believe it is.
First and foremost, the data is not clean. It has measurement limitations and artifacts. So the people involved say “just add that to the meta-data.” It’s not that simple.
For example, most people would like think that flow is flow. But a flow meter has artifacts of the measurement technique. For example, a flow meter based upon the Venturi Principle generally has an accuracy of around 1% of full scale and a turn-down ratio of 10 to 1 if it is installed correctly. What does that mean? It means that it will have that 1% accuracy, but only across a range from 100% of its maximum rated flow, down to 10% of rated flow. Below the 10% mark you will still get a reading, but the accuracy will not be the 1% that you can apply to the rest of the meter range. Further, if the meter is for a pipe greater than 24” in diameter, the boundary layer presumptions of the Venturi become invalid as swirl currents wind around around the pipe causing additional pressure measurement errors. This introduces still more inaccuracies.
To make matter worse, many Venturi metering systems are not well sized or installed correctly. So that 10 to 1 turndown ratio? You might not see most of it. It may be only five to one because someone sized the pipes and the Venturi for an extreme situation. Further, the meter must have straight and level pipe for a distance ratio of eight pipe diameters upstream of the throat or four pipe diameters downstream of the throat. Yet I have seen many places where the Venturi is bolted right to a 90 degree pipe bend or with a butterfly valve bolted right to the Venturi. This will clearly degrade the accuracy, and not by a small amount. The exact degradation depends upon the valve position at the very least.
Is your head spinning yet? That is just one kind of meter. There are many others. As a professional engineer of control systems, I have to know these things. People write books about this subject. There are handbooks about this with thousands of pages. It is a profession in its own right. This is why people who ask for meta-data for the tags on a plant do not understand what they’re getting into. Nevertheless, to data scientists who rarely put on a hardhat to see what the rest of the world looks like, flow is still flow –Am I Right?
This is why the profession of Operations and Engineering exist. In our formal reports, we make corrections and estimates based upon less than perfect data. We do this using our intimate knowledge of how things really work. When people grab data directly from the field, they lose that experience and understanding. And then things won’t total up while they run in circles annoying the very people they will need to help understand what is going on. Further, if you spent any time talking to the Superintendent, Engineers, and Senior Operators, you’d discover a world of things that your data scientist would spend many months or even years to figure out.
Yes, that process data is “your” data. But if you do not know what it means, it is not worth much.
But wait, it gets worse: The closer you try to get to this data, the more stress you put on the control system. Remember that the devices in a control system are not data servers that exist for the purpose of your amusement. They exist primarily to control a process and secondarily to let you know what they are doing. In a real-time cycle, you have maybe one or two milliseconds out of every ten to answer network requests. If your data gathering efforts get in the way of communications with other process elements or the operations HMI, there will be a lot of very angry people. Remember whose performance bonuses are on the line.
Even worse: have you secured your connection to the control system properly? Are you Sure? ARE YOU REALLY ABSOLUTELY CERTAIN THAT THIS WILL BE FAIL-SAFE? Remember, there will be physical consequences if you are wrong. Yes, security is a concern. If the process is not secure it won’t be safe. You will not want to meet operators who have had a process upset because someone abused your data connection and reconfigured a controller to do something awful.
So to you data scientists, I say this: Come to the plant or the infrastructure. Talk to the Superintendent, Engineers, and Operators. Do not make policy from the data without consulting them. They can fill you in on when and where the instrumentation has changed, what the data issues are like, how they make corrections and so on. They can tell you where the instrumentation may not be ideally sized for an application or when the process anomalies were triggered and why. They can also tell you where the data traffic may be too close to the limit to support your data demands. And finally, they can probably give you better reports on the limitations of the data than anything you can learn from a data lake.
There are costs to your data demands. There are capacity demands. There are security demands. There are comprehension problems. I am not suggesting that the Operations and Engineering staff have all the answers, but if you start by asking questions instead of plowing into data that you do not understand, you might just discover answers to the very things you would have spent weeks or months of studying that data tar-pit you call a lake.
