I lean towards wanting all my tools to support at least one common structured format (and JSON generally would seem the best supported) these days. It'd be great if there was a "standard" environment variable to toggle JSON input/output on for apps. If you did that, then e.g. a "structured" shell could just default to turning that on, and try detecting JSON on output and offering according additional functionality.
That'd be a tolerable graduated approach where you get benefits even if not every tool support it.
With "jq", I'm already increasingly leaning on JSON for this type of role.
It'd need to be very non-invasive, though, or all kinds of things are likely to break.
I wish there was a way for pipes to carry metadata about the format of the data being passed over the pipe, like a MIME type (plain text, CSV, JSON, XML, etc). Ideally with some way for the two ends of the pipe to negotiate with each other about what types each end supports. I think that sort of out-of-band communication would most likely need some kernel support.
Maybe an IOCTL to put a pipe into "packet mode" in which rather than just reading/writing raw bytes you send/receive Type-Length-Value packets... If you put your pipe in packet mode, and the other end then reads or writes without enabling packet mode, the kernel can send a control message to you saying "other end doesn't support packet mode" and then you can send/receive data in your default format. Whereas, if both ends enter packet mode before reading/writing, then the kernel sends a control message to one end saying "packet mode negotiated", and which triggers the two ends to exchange further control messages to negotiate a data format, before actually sending the data. (This implies pipes must be made bidirectional, at least for control packets.)
As counterpoint if only for the sake of my wetware memory, I'd rather have only a few pipes being as agnostic and generally capable as possible than complex plumbing.
I don't mean to argue there is no value in specialised adapters, but I believe the default should be as general as can be.
Let me worry about parsing/formats at the application level and get me simple underlying pipes. If I need something specific I should be prepared to dig into docs and find out what I need anyway, so default text vs install or explicitely configure your system to use something else seems like a sane feature/complexity segregation in the general case.
EDIT: good quote from another thread to illustrate my point:
> Removing the responsibility for reliable communication from the packet transport mechanism allows us to tailor reliability to the application and to place error recovery where it will do the most good. This policy becomes more important as Ethernets are interconnected in a hierarchy of networks through which packets must travel farther and suffer greater risks.
It's an innovative and interesting idea. But just to play devil's advocate for a minute, for this to get widespread uptake, the way it is implemented would have to be agreed on as being the right way, by a lot of people (i.e. users). And there could be lots of different ways of implementing what you describe, with variations, which makes it more difficult (though not impossible) to get accepted widely, compared to the existing plain pipes and passing data as text around between commands, because not much variation is possible there.
I'd prefer a "crash if receiver doesn't support sender's MIME type" model for debugability.
Which could still be ergonomic if all the common piping programs supported some specific collection of formats and there was a relatively robust format-converting program.
It looks like libxo does the output in structured format, but what about input? (For chaining commands together with pipes). Did the FreeBsd porting work[1] implement the input side with libxo, independently of libxo, or not at all?
(Not to diminish libxo --it looks pretty cool, and I didn't know about it before-- just curious.)
That looks very interesting, though I actually think that standardising on the command-line/env variable API is more important than an implementation. It's the moment you can "automagically" "upgrade" a pipe to contain structured data for apps that support it that you get the most value.
But given that it's there, I'll certainly consider it when writing tools, and consider supporting the command-line option/env var even if/when I don't use the implementation...
Nice idea! Maybe this issue is similar to that of command completion, but could also result in a similar mess. Every command has its "-h" or "--help", showing the syntax, available options and so on. But since even that is not really standardized, separate command-line completion rules are written for every command. And for every shell.
Define a standard format for describing command syntax. I'd suggest basing it on JSON, but you could use XML or Protobuf or Thrift or ASN.1 or Sexprs or whatever. Embed in executables a section containing the command syntax description object. Now when I type in a command, the shell finds the executable on the path, opens it, locates the command description syntax section (if present), and if found uses that to provide tab completion, online help, smarter syntax checking, whatever–if you've ever used OS/400, it has a feature where it constructs a fill-in-form dialog based on the command syntax descriptions, which you can call up at the press of a function key. (Obviously the shell should cache this data, it shouldn't reread /bin/ls every time I type "ls".)
There isn't a 1:1 mapping of binaries to documentation or completion scripts. That idea won't work out.
"Defining a standard format" hasn't been a successful practice in the Unix world. Many standards consist essentially of the bare minimum that everybody can agree with.
Unix command-line interfaces are already structured (as a list of strings), and more structure would be hard to support at the binary level. There are too many possibilities of doing that, and most CLI programs wouldn't even need that.
If the binary is using one of a few common libraries, e.g. GNU getopt, to read its command-line parameters, it should be possible to extract somewhat useful completions automatically.
Then you are essentially relying on the de-facto standard of not rolling your own argument parsing, as opposed to some product of a standards committee.
Getopt is just barely flexible enough to be applicable to many projects while still encoding some conventions (dash-prefixed options and optional arguments).
Still it's too opinionated to be applicable to all programs. And it's by far not structured enough to support automated completions.
Every programmer goes through this phase of trying to abstract what can't be abstracted. Essentially for non-trivial programs you need to encode what you want yourself. You can't write a library that is suitable for the infinitely many thinkable programs.
By the way, even handcoded completions are, often enough, more confusing than helpful to me (and not because they are handcoded). I've disabled them entirely.
> By the way, even handcoded completions are, often enough, more confusing than helpful to me (and not because they are handcoded). I've disabled them entirely.
Huh? Can you give some examples? I'm on zsh, and except for broken auto-generated shell completions like that provided by ripgrep, I've found zsh completions to be incredibly competent. Much more competent than myself usually.
I only know bash, but I doubt zsh can be so much better (yes, I know, they have colorful suggestions and better interactivity for selection).
The context sensitivity is just very confusing. Sometimes completion hangs, maybe because an SSH connection is made in the background, or the completion is just not very efficient.
Then sometimes I have a typo and get the weirdest suggestions and it's much harder to track down the error statically than just running the program and get a nice error message.
Then sometimes I am more intimate with the complexities of the command-line invocation than the completion script can appreciate, and get no completion, when all I need is just a stupid local file completion. This results in the (untrue) assumption a file wasn't there, which easily leads to further confusion.
No, thanks. I know how to use the tools that I use. Dear completion, just help me with the files on my disk (which is the fast-changing variable in the game) and let me type commands as I see fit.
Interesting. I use too many different tools too rarely to remember all their options, so being able to type -, hit tab and see a list of possibilities with their descriptions is simply more frictionless than having to pull up the command's man page and search through it. The cases where the suggestions don't work don't bother me, because it just means I don't save time compared to no completions at all.
This is actually a very interesting problem. The format would be an IDL, since you're describing a "struct" of options, array-valued fields, and "typed" fields that are constrained to certain grammars (numbers, &c.).
This sort of exists. The command syntax format is the standard man page "Usage" and "Options" sections, and an implementation is docopt (http://docopt.org/)
I still miss AmigaOS, where command options parsing was largely standardised ca. 1987 via ReadArgs() (barring the occasional poor/direct port from other OS's) [1]. It wasn't perfect (it only provided a very basic usage summary), but it was typed, and it was there.
There are lots of things I miss. Datatypes is definitively one of them. AREXX ports everywhere (I hate the language, but you don't need the language to make use of AREXX ports; Dbus is like what you get if you take AREXX and give it to a committee; the beauty of AREXX was the sheer simplicity that made it so trivial to add support for it to "everything" to the extent that many tools built their central message loop around checking for AREXX commands and built their application around dispatching AREXX like commands between different parts of the app).
Assigns is another one (for the non-Amiga aware that stumble on this, on the Amiga, I'd refer to my files with "Home:", but unlike on a Unix/Linux, Home: is not an environment variable that needs to be interpreted like $HOME or something that needs to be expanded like "~" - it is recognised by the OS as a valid filesystem path, assigned at runtime; similarly, instead of having a $PATH that needs to be interpreted, all my binaries would be accessible via "C:" for "command" - "C:" is an assign made up of multiple actual directories, one of which would typically be Sys:C, where Sys: is the System volume, and Sys: itself is an assign pointing to whatever drive you booted; Assigns are like dynamic symlinks with multiple targets, or like $PATH's interpreted by the OS)
jq is awesome. "json lines" data format (where each record is json without literal newlines, and newline separates records) nicely upgrades unix pipes to structured output, so you can mix old unix tools, like head, tail etc with jq.
maybe what's needed is some set of simple json wrappers around core utils to increase adoption.
Note that when such software is used, numbers that are
integers and are in the range [-(2**53)+1, (2**53)-1]
are interoperable in the sense that implementations will
agree exactly on their numeric values.
2^53 is only 9007199254740990, so it's not too hard to exceed that, particularly in things like twitter status ids.
The recommended use is to have big numbers as strings, since it's the only way to reliably pass them around. (Yes, this is kind of horrible.)
Again, not saying this is a good thing, but it's an understandable consequence of the standardization partially involving "let's document how the browsers existing JS implementations deal with running eval on a string". It's easier to write the standard so that existing implementations fit it, rather than requiring changes from everyone.
I imagine it depends on your aims. If your aim is just to document existing behavior, that's one thing. But if your aim is to create a new, simple, data interchange format, documenting existing implementations is only going to limit its usefulness. It could have forced the current implementations fixed instead of setting the bar.
As pointed out upstream, we're dealing with a lot of data these days, and such limitations will only cause JSON to become marginalized or complicated with implementation-dependent workarounds, like integers in strings.
The point is that JSON started out as "data you can eval() in Javascript" pretty much. It gained the traction it did because it is literally a subset of javascript object notation so it was trivial to support.
That'd be a tolerable graduated approach where you get benefits even if not every tool support it.
With "jq", I'm already increasingly leaning on JSON for this type of role.
It'd need to be very non-invasive, though, or all kinds of things are likely to break.