Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What would be ideal to solve first is some sort of initial format negotiation on pipes. Otherwise you will end up with the wrong thing happening (eg having to reimplement every tool, spewing "rich" format to tools that don't know it, or regular text to tools that could do better).

We've already seen something like this - for example ls does column output if going directly to a screen, otherwise one per line, and many tools will output in colour if applicable. However this is enabled by isatty() which uses system calls, and inspecting the terminal environment for colour support.

Another example is telnet which does feature negotiations if the other end is a telnet daemon, otherwise just acts as a "dumb" network connection. (By default the server end initiates the negotiations.)

However the only way I can see this being possible with pipes is with kernel/syscall support. It would provide a way for either side to indicate support for richer formats, and let them know if that is mutually agreeable, otherwise default to compatible plain old text. For example an ioctl could list formats supported. A recipient would supply a list before the first read() call. The sender would then get that list and make a choice before the first write() call. (This is somewhat similar to how clipboards work.)

So the question becomes would we be happy with a new kernel call in order to support rich pipes, which automatically use current standard behaviour in its absence or when talking to non-rich enabled tools?

I would love it if grep/find/xargs automatically knew about null terminating.



man grep:

   -Z, --null
      Output a zero byte (the ASCII NUL character) instead  of  the  character  that  normally
      follows  a  file  name.   For example, grep -lZ outputs a zero byte after each file name
      instead of the usual newline.  This option makes the output  unambiguous,  even  in  the
      presence  of file names containing unusual characters like newlines.  This option can be
      used with commands like find -print0,  perl  -0,  sort  -z,  and  xargs  -0  to  process
      arbitrary file names, even those that contain newline characters.

   -z, --null-data
      Treat the input as a set of lines, each  terminated  by  a  zero  byte  (the  ASCII  NUL
      character)  instead of a newline.  Like the -Z or --null option, this option can be used
      with commands like sort -z to process arbitrary file names.
man xargs:

   --null
   -0     Input  items are terminated by a null character instead of by whitespace, and the quotes
      and backslash are not special (every character is taken literally).  Disables the end of
      file  string,  which  is treated like any other argument.  Useful when input items might
      contain white space, quote marks, or backslashes.  The GNU find -print0 option  produces
      input suitable for this mode.
man find:

   -print0
      True;  print  the  full  file  name on the standard output, followed by a null character
      (instead of the newline character that -print uses).  This allows file names  that  con‐
      tain newlines or other types of white space to be correctly interpreted by programs that
      process the find output.  This option corresponds to the -0 option of xargs.


Yes, in other words, the parent is right that zero termination is currently not automatic.


He did not say automatic, he said "knew about" nulls. When you talk about automagically detecting nulls I have this image of an ascii-art Clippy with a cowsay bubble that says "I see you are using null terminated data, I have enabled --null for you."


I did exactly say automatic. It is the previous word to "knew about" you quoted!

And yes, I would expect that find detects that when it is talking to xargs then null termination should be used without the user having to go and fish out what the options are for each tool. And if you used ps with another tool that prefers json then ps can automatically do that, again without having to find and maintain flags.


"automatically knew", in a post which talks about format negotiation. It was fairly obvious to me he meant that it would use the format negotiation to automatically enable the --null switch.


null-terminated strings are hard to read in a shell window. and isatty(3) does not work for pagers.

content nagotiation only works with bi-directional data transfer (i.e. not with pipes).


Thats only true if you only negotiate via data in the pipe. dtools (in the article) uses non-mandatory file locks to do the content negotiation on the pipe.


My code does format negotiation on the pipe to determine whether to send the data in textual form or binary form.

It uses file locks (F_SETLK) on the pipe with a magic offset value offset to do the negotiation.


But you still have race conditions. The sender would have to ensure that the locks are setup before the receiver calls read() for the first time. Since pipes are often setup by the shell you have no control over the startup times. Sure you could have heuristics such as the receiver waiting a few seconds just in case locks show up, but that just makes things slow and unpredictable.

I stand by my assertion that this can only be solved well (ie 100% predictable behaviour no matter what order things start in or how long they take to intialise) by a new system call/ioctl.


No, I avoid the race condition by: 1) Reader sets the lock before reading any data 2) Writer writes a byte to the pipe 3) Writer waits until pipe is empty (FIONREAD ioctl) 4) Writer checks for existance of lock.

This should be race free.


That requires the first byte sent be compatible with whatever format is ultimately used. As an example for ps, the first byte in JSON should be a { while for plain text it should be space. (We get a little lucky since a space would also be acceptable for JSON, but I doubt there is a universal first byte.) And an initial space isn't accept for a programs like find or grep in either text or null separation mode.

I don't want to belittle what you've done, but the point remains. This can't be done robustly without an additional system call. What you have is tantalizingly close. Even a call as simple as telling the sender that the receiver has called read() would complete your solution.


Not really. All you need to be able to do is to produce the first byte of whatever would have been produced in the "fallback case", i.e. when the reader does not handle format negotiation. Then, when the writer sees that the reader supports format negotiation it will need to signal that the alternative format was chosen. I do this by sending a zero byte (which should never appear in the fallback text format). Then the two first bytes are skipped as part of the negotiation framework when a non-fallback format was chosen.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: