I would allow that for some definition of "too-big". I wrote a distributed grep ...

I would allow that for some definition of "too-big".

I wrote a distributed grep impl a few years back to grep my logs and collect output to a central machine (a vague "how may machines had this error" job).

The central orchestration was easy in python, but implementing

zgrep | awk | sort | uniq -c | wc -l

is way faster and way more code in python than to do it with shell (zgrep is awesome for .gz logs).

On the other hand, the shell co-ordinator way way harder using pdsh that I reverted to using paramiko and python threadpools.

Unix tools are extremely composable and present in nearly every machine with the standard behaviour.