That is a terrible idea: sometimes the app can take advantage of a constraint to minimize work done.
In your example, if we just wanted to filter for a particular user, dps would have to print out ALL of the information and then you could pick at it. This doesn't seem bad for ps (because there's a hard limit) but in many other examples the output could be much larger than what is needed. That's why having filtering and output flags in many cases is more efficient in generating everything.
As a side note: To demonstrate a dramatic example, I tried timing two things:
- dumping NASDAQ feed data for an entire day, pretty-printing, and then using fgrep
- having the dumper do the search explicitly (new flags added to program)
Both outputs were sent to /dev/null. The first ran in 35 minutes, the second in less than 1 minute
Streams clamp everything in them to O(n). That's a problem in some cases; for example, your NASDAQ feed dumper probably has some kind of database inside itself that lets it run filters in massively sublinear time, and making it linear would be a significant performance hit.
However, there are an equal number of tasks that are not sublinear. Some of them are also very common and important sysadmin-y things. Iterate through a directory applying some operation to every file. Slurp a file and look for a particular chunk of bits. And so on. For those sysadmins, a little structure in their stream can make their job a lot easier. It'd be like the difference between assembly and C: all of a sudden things have names.
Obviously for many cases avoiding output is better than post-output filtering. For these cases the originating process should do the filtering. However, in many practical situations the data sets are small enough to not matter, or the operation wanted will not filter out most data anyway.
Basically, you're arguing that grep is a bad tool (it has the same issues) yet its a very commonly used tool.
That's not true. So in the case of `ps`, there is a known limit to the number of processes, and it is fairly small, so the performance hit is limited.
As another example in this context, if the original data source is gzip'd, it's faster to gunzip and then pipe rather than integrating the gzip logic into the app itself.
I still disagree. I think you are arguing for the inclusion of, at the least, grep, cut, head and tail in cat.
I do not claim that is a bad idea (conceptually, pipes do not require multiple processes, and those tools could be dynamically linked in) but why stop at those tools? Some people would argue that sed and awk also should be in, others would mention perl, etc.
I also do not see why it would be faster to use an external gzip tool through a pipe. If it is, the writer of the 'tool with built-in unzip' could always, in secret, start an external unzip process to do the work.
In your example, if we just wanted to filter for a particular user, dps would have to print out ALL of the information and then you could pick at it. This doesn't seem bad for ps (because there's a hard limit) but in many other examples the output could be much larger than what is needed. That's why having filtering and output flags in many cases is more efficient in generating everything.
As a side note: To demonstrate a dramatic example, I tried timing two things:
Both outputs were sent to /dev/null. The first ran in 35 minutes, the second in less than 1 minute