I listened to a podcast a while ago where someone was speculating why programs processing data can't just be connected together like devices in a water system. Pumps, sinks, boilers, showers, etc. Or like electrical devices, just channel the data between them.
It occurred to me that data isn't like water, or electricity. Those are generic resources that can largely be easily standardised and treated consistently by any device. Data isn't like that.
Data formats and encoding vary hugely and can have crazy different properties. It's more like piping chemicals between devices in a chemical factory. You can't just take the output from a device producing acid and plug the pipe into the input of a device that's designed to process water, or ammonia, or petrol. They're all 'just' fluids and you can pipe them about, but their properties vary wildly. You can only feed one into a device as input if that device is specifically designed to process that specific material.
Yes it is possible to define standardised data formats, up to a point. XML, JSON, CSV, etc but even then you can't just feed arbitrary JSON into every program designed to ingest JSON and expect it to work, just because the data it expects is JSON formatted.
Yes Unix has pipes, but each tool in the pipe chain has to be told exactly how to process the specific input it gets from the previous tool. You can't just look in history for two arbitrary examples of using pipes, and cut and paste the first half of one pipe chain, and paste the second half of another arbitrary pipe chain on the end, and expect to get something useful out of the combination. Maybe you'll get lucky, but usually you'll get garbage.
"It occurred to me that data isn't like water, or electricity. Those are generic resources that can largely be easily standardised and treated consistently by any device."
Well, electricity is not really easy either. There is a huge effort, to transform the current in the needed shape (voltage, current, frequency, DC vs. AC, smoothened... And water is also not just water, as it can be clean drinking water, or sewage water, or hot (but slightly dirty) water for the heating, or you have hot, high pressured steam, ...
I don't think he was trying to imply that coding was the only thing that's complicated.
I think it comes back to "No Silver Bullet" and what it has to say about accidental and essential difficulties.
There may be impurities in water, yes, and it can be several temperatures, but water is water is water. It's still chemically two hydrogen atoms connected to a single oxygen. We can define what pure water is. Having that definition allows us to define tolerances for how much "not-water" is in the water. What kind of pipe is necessary to deal with the not-water, etc. The essential difficulties of plumbing and water management are never really about the water, it's about how to deal with not-water.
Same with electricity, the way it moves may be differing, but it's still just electrons. There's no special electricity that will conduct through rubber. Again, we can define what electricity is. And again, having those definitions, we've moved our essential difficulties to the not-electricity part of the problem.
There is no standard "data". We cannot define what data is. Because it kind of is everything. It's a nebulous, abstract concept. It doesn't mean anything on its own. What we really want to do is process subsets of that data. And filtering to that subset is the essential difficulty. And then you have the issue that two consumers of data could want similar data, but not quite. So data's essentially difficulties come sooner and once you've transformed your data into water, you still have more essential difficulties to handle.
Coding includes things like simulating water and electricity, hence it is more complicated.
For example, lets say you want to code to check that your electric setup is right. Then you implement the concepts voltage, current, frequency, DC AC etc, and run it to see what the end result looks like. Coding this is of course more complicated than learning about the concepts in the first place. And as have been said many times before, coding this system is often the easiest part of the job, then you have to add all the helper systems, the UI etc, and that is the really hard part that often is the reason projects fails.
Of course taking electrical engineering in college is way harder than software engineering or computer science, but that is because the software engineering and computer science tracks in college are usually a joke. If you had to be able to code systems as described above then it wouldn't be easier at all.
That's exactly what I was thinking. Electrical engineering is a complex field and may seem easy from the outside simply because of its maturity, but I would argue is the hardest of the engineering disciplines to grasp as well as one of the most important.
> why programs processing data can't just be connected together like devices in a water system. Pumps, sinks, boilers, showers, etc. Or like electrical devices, just channel the data between them.
I'm not a functional programmer, but don't monads aim to do something like that?
> You can't just take the output from a device producing acid and plug the pipe into the input of a device that's designed to process water, or ammonia, or petrol.
It occurred to me that data isn't like water, or electricity. Those are generic resources that can largely be easily standardised and treated consistently by any device. Data isn't like that.
Data formats and encoding vary hugely and can have crazy different properties. It's more like piping chemicals between devices in a chemical factory. You can't just take the output from a device producing acid and plug the pipe into the input of a device that's designed to process water, or ammonia, or petrol. They're all 'just' fluids and you can pipe them about, but their properties vary wildly. You can only feed one into a device as input if that device is specifically designed to process that specific material.
Yes it is possible to define standardised data formats, up to a point. XML, JSON, CSV, etc but even then you can't just feed arbitrary JSON into every program designed to ingest JSON and expect it to work, just because the data it expects is JSON formatted.
Yes Unix has pipes, but each tool in the pipe chain has to be told exactly how to process the specific input it gets from the previous tool. You can't just look in history for two arbitrary examples of using pipes, and cut and paste the first half of one pipe chain, and paste the second half of another arbitrary pipe chain on the end, and expect to get something useful out of the combination. Maybe you'll get lucky, but usually you'll get garbage.