That is a terrible idea: sometimes the app can take advantage of a constraint to...

saulrh · on Aug 11, 2012

Streams clamp everything in them to O(n). That's a problem in some cases; for example, your NASDAQ feed dumper probably has some kind of database inside itself that lets it run filters in massively sublinear time, and making it linear would be a significant performance hit.

However, there are an equal number of tasks that are not sublinear. Some of them are also very common and important sysadmin-y things. Iterate through a directory applying some operation to every file. Slurp a file and look for a particular chunk of bits. And so on. For those sysadmins, a little structure in their stream can make their job a lot easier. It'd be like the difference between assembly and C: all of a sudden things have names.

alexlarsson · on Aug 11, 2012

Obviously for many cases avoiding output is better than post-output filtering. For these cases the originating process should do the filtering. However, in many practical situations the data sets are small enough to not matter, or the operation wanted will not filter out most data anyway.

Basically, you're arguing that grep is a bad tool (it has the same issues) yet its a very commonly used tool.

Someone · on Aug 11, 2012

If "sometimes the app can take advantage of a constraint" is an argument here, you should be against all usage of pipes.

veyron · on Aug 11, 2012

That's not true. So in the case of `ps`, there is a known limit to the number of processes, and it is fairly small, so the performance hit is limited.

As another example in this context, if the original data source is gzip'd, it's faster to gunzip and then pipe rather than integrating the gzip logic into the app itself.

Someone · on Aug 11, 2012

I still disagree. I think you are arguing for the inclusion of, at the least, grep, cut, head and tail in cat.

I do not claim that is a bad idea (conceptually, pipes do not require multiple processes, and those tools could be dynamically linked in) but why stop at those tools? Some people would argue that sed and awk also should be in, others would mention perl, etc.

I also do not see why it would be faster to use an external gzip tool through a pipe. If it is, the writer of the 'tool with built-in unzip' could always, in secret, start an external unzip process to do the work.

Daniel_Newby · on Aug 11, 2012

So the NASDAQ dumper should accept a structured query as its input. This is an architecture issue, not a data format issue.