What would be ideal to solve first is some sort of initial format negotiation on...

dfc · on Aug 11, 2012

man grep:

   -Z, --null
      Output a zero byte (the ASCII NUL character) instead  of  the  character  that  normally
      follows  a  file  name.   For example, grep -lZ outputs a zero byte after each file name
      instead of the usual newline.  This option makes the output  unambiguous,  even  in  the
      presence  of file names containing unusual characters like newlines.  This option can be
      used with commands like find -print0,  perl  -0,  sort  -z,  and  xargs  -0  to  process
      arbitrary file names, even those that contain newline characters.

   -z, --null-data
      Treat the input as a set of lines, each  terminated  by  a  zero  byte  (the  ASCII  NUL
      character)  instead of a newline.  Like the -Z or --null option, this option can be used
      with commands like sort -z to process arbitrary file names.

man xargs:

   --null
   -0     Input  items are terminated by a null character instead of by whitespace, and the quotes
      and backslash are not special (every character is taken literally).  Disables the end of
      file  string,  which  is treated like any other argument.  Useful when input items might
      contain white space, quote marks, or backslashes.  The GNU find -print0 option  produces
      input suitable for this mode.

man find:

   -print0
      True;  print  the  full  file  name on the standard output, followed by a null character
      (instead of the newline character that -print uses).  This allows file names  that  con‐
      tain newlines or other types of white space to be correctly interpreted by programs that
      process the find output.  This option corresponds to the -0 option of xargs.

alexlarsson · on Aug 11, 2012

Yes, in other words, the parent is right that zero termination is currently not automatic.

dfc · on Aug 11, 2012

He did not say automatic, he said "knew about" nulls. When you talk about automagically detecting nulls I have this image of an ascii-art Clippy with a cowsay bubble that says "I see you are using null terminated data, I have enabled --null for you."

rogerbinns · on Aug 11, 2012

I did exactly say automatic. It is the previous word to "knew about" you quoted!

And yes, I would expect that find detects that when it is talking to xargs then null termination should be used without the user having to go and fish out what the options are for each tool. And if you used ps with another tool that prefers json then ps can automatically do that, again without having to find and maintain flags.

alexlarsson · on Aug 11, 2012

"automatically knew", in a post which talks about format negotiation. It was fairly obvious to me he meant that it would use the format negotiation to automatically enable the --null switch.

ibotty · on Aug 11, 2012

null-terminated strings are hard to read in a shell window. and isatty(3) does not work for pagers.

content nagotiation only works with bi-directional data transfer (i.e. not with pipes).

alexlarsson · on Aug 11, 2012

Thats only true if you only negotiate via data in the pipe. dtools (in the article) uses non-mandatory file locks to do the content negotiation on the pipe.

alexlarsson · on Aug 11, 2012

My code does format negotiation on the pipe to determine whether to send the data in textual form or binary form.

It uses file locks (F_SETLK) on the pipe with a magic offset value offset to do the negotiation.

rogerbinns · on Aug 11, 2012

But you still have race conditions. The sender would have to ensure that the locks are setup before the receiver calls read() for the first time. Since pipes are often setup by the shell you have no control over the startup times. Sure you could have heuristics such as the receiver waiting a few seconds just in case locks show up, but that just makes things slow and unpredictable.

I stand by my assertion that this can only be solved well (ie 100% predictable behaviour no matter what order things start in or how long they take to intialise) by a new system call/ioctl.

alexlarsson · on Aug 11, 2012

No, I avoid the race condition by: 1) Reader sets the lock before reading any data 2) Writer writes a byte to the pipe 3) Writer waits until pipe is empty (FIONREAD ioctl) 4) Writer checks for existance of lock.

This should be race free.

rogerbinns · on Aug 11, 2012

That requires the first byte sent be compatible with whatever format is ultimately used. As an example for ps, the first byte in JSON should be a { while for plain text it should be space. (We get a little lucky since a space would also be acceptable for JSON, but I doubt there is a universal first byte.) And an initial space isn't accept for a programs like find or grep in either text or null separation mode.

I don't want to belittle what you've done, but the point remains. This can't be done robustly without an additional system call. What you have is tantalizingly close. Even a call as simple as telling the sender that the receiver has called read() would complete your solution.

alexlarsson · on Aug 12, 2012

Not really. All you need to be able to do is to produce the first byte of whatever would have been produced in the "fallback case", i.e. when the reader does not handle format negotiation. Then, when the writer sees that the reader supports format negotiation it will need to signal that the alternative format was chosen. I do this by sending a zero byte (which should never appear in the fallback text format). Then the two first bytes are skipped as part of the negotiation framework when a non-fallback format was chosen.