Hacker News new | past | comments | ask | show | jobs | submit login

> "The arguments in the article indeed don't have much to say about Unix Philosophy per se - they're just a list of various fuckups and idiocies Unix accumulated for some reasons or others."

Right. The title should have been reflective of that "Various idiocies Unix has accumulated to this day" but since the article mentions Unix Philosophy, my point is that the article should have criticised the philosophy and not the practice.

> "Passing text streams around is a horrible idea because now each program has to have its own, half-assed shotgun parser and generator, and you have to glue programs together with your own, user-provided, half-assed shotgun parsers, i.e. calls to awk, sed, etc."

But this has actually proved to be very useful as it provided a standard medium of communication between programs that is both human readable and computer understandable. And ahead of its time since it automatically takes advance of multiprocessor systems, without having to rewrite the individual components to be multi-threaded.

> "(3) makes you programming with a dynamic, completely untyped language which forces each function to accept and return a single parameter that's just a string blob. No other data structures allowed."

That may be a performance downside in some cases, but the benefit of having a standard universally-agreeable input and output format is the time it saves Unix operators who can quickly pipe programs together. That saves more total human time than gained from potential performance benefits.




> And ahead of its time

It wasn't ahead of its time. By the time Unix was created, people were already aware of the benefits of structured data.

> it automatically takes advance of multiprocessor systems, without having to rewrite the individual components to be multi-threaded.

That's orthogonal to the issue. The simple solution to Unix problems would be to put a standard parser for JSON/SEXP/whatever into libc or OS libraries and have people use it for stdin/stdout communication. This can still take advantage of multiprocessor systems and whatnot, with an added benefit of program authors not having to each write their own buggy parser anymore.

> but the benefit of having a standard universally-agreeable input and output format is the time it saves Unix operators who can quickly pipe programs together. That saves more total human time than gained from potential performance benefits.

I'd say it's exactly the opposite. Unstructured text is not an universally-agreeable format. In fact, it's non-agreeable, since anyone can output anything however they like (and they do), and as a user you're forced to transform data from one program into another via more ad-hoc parsers, usually written in form of sed, awk or Perl invocations. You lose time doing that, each of those parsing steps introduces vulnerabilities, and the whole thing will eventually fall apart anyway because of million reasons that can fuck up the output of Unix commands, including things like your system distribution and your locale settings.

As an example of what I'm talking about, imagine that your "ls" invocation would return a list of named rows in some structured format, instead of an ASCII table. E.g.

  ((:columns :type :permissions :no-links :owner :group :size :modification-time :name)
   (:data
    (:directory 775 8 temporal temporal 4096 1488506415 ".git")
    (:file 664 1 temporal temporal 4 1488506415 ".gitignore")
      ...
    (:file 755 1 temporal temporal 69337136 1488506415 "hju")))
With such a format you could trivially issue commands like:

  ls | filter ':modification-time < 1 month ago' | cp --to '/home/otheruser/oldfiles/'
  find :name LIKE ".git%" | select (:name :permissions) | format-list > git_perms_audit.log
Hell, you could display the usual Unix "ls -la" table for the user trivially too, but you wouldn't have to parse it manually.

BTW. This is exactly what PowerShell does (except it sends .NET objects), which is why it's awesome.


There are no problems where you see them.

Most text formats are trivial to parse and space-separated or character-separated is the way to go. It really doesn't help if you enclose shit in parens. (Parens are sometimes a good way to encode trees, though).

    > (:columns :type :permissions :no-links :owner :group :size :modification-time :name)
That format doesn't solve any of the problems you mention. The problem is that it's hard to agree what data should be inside, not how you encode it.

    > ls | filter ':modification-time < 1 month ago' | cp --to '/home/otheruser/oldfiles/'
    find -mtime -30 | xargs cp -t /home/otheruser/oldfiles

    > find :name LIKE ".git%" | select (:name :permissions) | format-list > git_perms_audit.log
    find -name '.git*' -printf '%m %f\n' > git_perms_audit.log
Use 0-separated if you care that technically filenames can be anything (except / and NUL). Or say "crap in, crap out". Or assert that it's not crap before processing it.

> Hell, you could display the usual Unix "ls -la" table for the user trivially too, but you wouldn't have to parse it manually.

You don't parse "ls -la". You just don't.

> BTW. This is exactly what PowerShell does (except it sends .NET objects), which is why it's awesome.

Powershell is an abomination, and because it encourages coupling of interacting programs it will never be as successful as the Unix model. There will never be the same variety of interacting programs for very practical reasons.


> But this has actually proved to be very useful as it provided a standard medium of communication between programs that is both human readable and computer understandable. And ahead of its time since it automatically takes advance of multiprocessor systems, without having to rewrite the individual components to be multi-threaded.

Except it is completely unusable for network applications because the error handling model is broken (exit status? stderr? signals? good luck figuring out which process errored out in a long pipe chain) and it is almost impossible to get the parsing, escaping, interpolation, and command line arguments right. People very quickly discovered that CGI Perl with system/backticks was a very insecure and fragile way to write web applications and moved to the AOLServer model of a single process that loads libraries.


It's true that error handling with (shell) pipes is not possible in a clean way in general. In shell, the best you can do is probably "set -o pipefail", but that's only in bash. Concurrency with IO on both sides is really hard to get right even in theory.

Text representation is a good idea regardless of whether you pipe or not.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: