YAML: probably not so great after all (2017)

clarkevans · on June 20, 2018

One thing to remember is that YAML is about 20 years old. It was created when XML was at peak popularity. JSON didn't exist (YAML is a parallel, contemporary effort). Even articulating the problems with XML's approach was an uphill battle. What you would replace it with is also hard. What use cases matter? What is the core model? A simple hierarchy? Typed nodes? A graph? What sort of syntax is needed for it to be usable? These were all questions. Seen in context, we got quite a bit correct. And yes... it has a few embarrassing warts and a few deep problems. Ah well.

A second thing to consider... YAML was created before it was common that tech companies actively contributed to open source development. There are lots of things we could have done differently if we had more than a few hours per week... even a tiny bit of financial support would have helped.

Finally, YAML isn't just a spec, it has multiple implementations. Getting consensus among the excellent contributors is a team effort, and particularly challenging when no one is getting paid for the work. Once you have a few implementations and dependent applications, you're kinda stuck in time.

It was an special pleasure for me to have had the opportunity to work with such amazing collaborators.

We did it gratis. We are so glad that so many have found it useful.

arp242 · on June 20, 2018

I am the author of this article. Apparently people read my website (how they get there, I don't know?)

At any rate, it's worth mentioning that in the conclusion I wrote:

> Don’t get me wrong, it’s not like YAML is absolutely terrible but it’s not exactly great either.

I still use YAML myself even when I have the freedom to use something else simply because – for better or worse – it's very widespread, and for many tasks it's "good enough". For other tasks, I prefer to avoid it.

I think that a stricter version of YAML (such as StrictYAML) would make a lot of people's lives easier though.

brightball · on June 20, 2018

I think you're probably right there. I use YAML when something else I'm using calls for it, but mainly I tend to output things in it just because it's very readable.

Using it a lot more lately as I'm diving into Ansible, so I'll be interested to see if I run into problems.

pabl0rg · on June 21, 2018

It is particularly unfortunate that ansible uses yaml because if infrastructure is going to be code, some day you will surely want to refactor.

kakwa_ · on June 21, 2018

The trend of "stick together Yaml and a template engine, we have our DSL!" in CM sytems is a bit horrible.

Ansible does make some efforts to limit jinja templating to variable substitution, but it's sill not that great, you have all kinds of weird stuff that can happen specially with colons.

The worst one is saltstack, the resulting syntax is just atrocious and border line unreadable, I not a big fan of map.jinja files[0] and on the yaml side, things can get ugly quite fast [1].

I know it's not a popular opinion, but I would rather use the puppet DSL, even with its step learning curve.

[0] https://github.com/saltstack-formulas/salt-formula/blob/mast...

[1] https://github.com/saltstack-formulas/mysql-formula/blob/mas...

raziel2p · on June 21, 2018

The whole concept of templating language on top of YAML is suspect anyway, but I wish Salt had just gone with Mako as the default templating language. That way you could write plain Python in your templates and not have this horrible misuse of Jinja.

I also have come to agree that a DSL is the best solution, though Puppet's particular DSL is not a great example. Projects that re-implement the same thing from scratch like mgmt[1] are on the right track, but probably won't gain enough traction.

[1] https://github.com/purpleidea/mgmt/blob/master/docs/language...

young_unixer · on June 20, 2018

I love the clean style of your website.

arp242 · on June 20, 2018

Thanks! Last time I checked my domain got penalized for having abnormal low markup or some such, which apparently makes it look like a spam site. I am proud of this.

foo101 · on June 21, 2018

> Last time I checked my domain got penalized for having abnormal low markup or some such

Do you have a link to the document you were pointed to when you got penalized? If it was Google who penalized you, they must have pointed you to a URL with documentation on why you got penalized and how to resolve it.

I ask this because I run a few websites with even lesser markup than your site but I have never got penalized. I once got penalized due to excessive number of spam comments on one of my websites and they pointed me to https://support.google.com/websearch/answer/190597 ("Remove this message from your site") to resolve the issue. This issue did not affect the search ranking much though (dropped by only about 2 or 3 places in the list of results). But never had an issue with abnormally low markup.

The markup in your website looks pretty reasonable to me, so I am surprised you could get penalized for that when I have had no issues with even lesser markup and they still appear at the top of the list of results for relevant search terms.

arp242 · on June 21, 2018

I think it was some tool at moz.com, but I don't recall from the top of my head. I don't think it was Google itself. I have no idea what effect that has; I'm not really in to that world.

> I have had no issues with even lesser markup and they still appear at the top of the list of results for relevant search terms.

It seems people are finding my site, whether or not it's being penalized. I mean, someone other than me posted it here, right?

progval · on June 20, 2018

What a time to be alive

coherentpony · on June 20, 2018

Penalised? By whom?

kurthr · on June 20, 2018

I would assume tehGoog. Just to make it hard to find low impact sites that aren't AMP.

coherentpony · on June 21, 2018

Oh I see. Thanks.

loeg · on June 20, 2018

The only search engine in town. You already know its name.

King-Aaron · on June 21, 2018

Oh you mean Ask Jeeves?

paulmd · on June 23, 2018

No, he means Hotbot.

_ugfj · on June 20, 2018

Drupal 8 uses YAML* as its configuration language because JSON doesn't support comments. That simple. Thank you for YAML, it does deliver for us: it's human readable and it's easy to parse (see below).

* I mean, it uses an ill defined subset of YAML. The definition is "whatever the Symfony YAML parser supports".

wwweston · on June 20, 2018

You know what else is human readable, easy to parse if you're using PHP, and supports comments?

PHP.

I understand why some languages rely on common configuration file formats.

I don't understand why the popular dynamic script-y languages don't more commonly use the natively-expressable associative/list data structures that they're famous for making convenient.

kakwa_ · on June 21, 2018

Using includes/imports is not the greatest idea ever.

Your configuration file is one of your program interface. It's something that must be well define. If your configuration file is a programing language this interface is not that well defined.

Also you expose yourself to all kind of weird bugs because some (too smart for their own good) people will monkey patch your software using it.

It adds a lot of unnecessary stuff in the configuration file, things like ';' or '$' are not really useful.

Lastly, common configuration file format are good because there are... common. You can have 2 pieces of software in 2 different languages accessing the same configuration file. A common example of that is configuration management, There are a lot of modules/formula in salt/ansible/puppet/chef doing fine parsing of the configuration files and permits fine grain settings, and I'm not mentioning augeas. If your configuration is a php/python/perl/ruby file good luck with that.

I know it's really common for php applications to do configuration files in php, but frankly, it's a bit annoying.

Sean1708 · on June 21, 2018

> If your configuration file is a programing language this interface is not that well defined.

While I do agree with the rest of your comment I don't think they were advocating using the full language for configuration, just the maps/arrays/etc. (e.g. Python's `literal_eval`).

kakwa_ · on June 21, 2018

true, but some user will use the full language.

Something like:

config = {'key1': 'value1', 'key2': 'value2'}

could be written as:

config = {}

config['key1'] = 'value1'

config['key2'] = 'value2'

With large chunk possible between the 3.

It basically transforms the configuration file into an API like any library, which is not really what you want for an end user program.

wwweston · on June 22, 2018

If a key objection/perceived threat is that this might give someone an insertion point they're not meant to use for code ... well, let's consider that we're talking about applications distributed as interpreted language source here. Disallowing code-as-config isn't even closing the door of this particular barn after the horse has left, it's putting two strands of police line tape across the bottom half of the gap where the door was never installed and hoping any equines thinking of passage politely consider the message in case it hadn't already occurred to them which side of the entrance they preferred to be on.

Consider this: Design and optimize for the common case.

Why do we have config files? Because developers actually want a place dedicated to simple or structured application configuration data, for which PHP assignments with arrays + primitives can function at least as effectively as JSON. Most developers would prefer that config data get loaded quickly so the application can get on to doing actual app-y things. Using the language for this means you're parsing at least as fast as you can interpret and you can also take advantage of any code caching that's part of your deployment (especially nice in the PHP-likely event that config settings would be reloaded with every request).

Abuse isn't likely to be the common case. The end users you invoked certainly aren't going to be the ones looking for opportunities to insert code over data. Developers have other places to put code and, as mentioned, probably actually want a place dedicated to data. You're still right that of course someone will do it, just like someone will inevitably create astronaut architecture hierarchy monstrosities in any language with classical inheritance or make potentially hidden/scary changes to language function using metaprogramming facilities.

But potential for abuse doesn't automatically mean a feature should be disallowed.

A lot of the time it's better to let people who can be circumspect have the benefits of a potential approach, and if somebody thinks they need to solve a problem by using a technique that's arguably abuse, well, let them either find out why it's a bad idea or enjoy having solved their problem in an unusual way. Not the end of the world. Possibly even legit.

derefr · on June 20, 2018

You can use arbitrary tools to programmatically generate YAML (or JSON, or XML, any of the other "data only" formats.) This allows for tools to drive other tools by generating a spec file and feeding it in. See e.g. Kubernetes for a good example of that.

There's no language that I'm aware of that can natively generate PHP syntax, and there's no common multi-language-platform library for generating PHP syntax. I think that's most of the reason.

To contradict myself, though: Ruby encodes Gemfiles and Rakefiles as Ruby syntax. And Elixir encodes Mixfiles, Mix.Config files, Distillery release-config files, and a bunch of other common data formats as Elixir syntax.

And, of course, pretty much every Lisp just serializes the sexpr representation of the live config for its config format (which means that, frequently, a lot of Lisp runs code at VM-bootstrap time, because people write Turing-complete config files.)

wwweston · on June 20, 2018

> There's no language that I'm aware of that can natively generate PHP syntax

This is a solid argument against using PHP (or any such language) as a cross-language data interchange format. There are others :) And I totally agree you want a language independent format for anything you might have to feed across an ecosystem of tools.

For a PHP-system generating/altering its own config files... PHP's `var_export` generates a PHP-parseable string representation of a variable (though it sadly doesn't use the short array syntax).

Turing-complete config files probably have some hazards, like Lisp itself does. YMMV regarding whether those hazards can be avoided by circumspect developers or need to be fenced off.

scrollaway · on June 21, 2018

You don't know when you'll need to generate or parse your config files with something that either can't read, write or execute your language.

Django's settings.py sucks. I've used Django since the 0.9 days. It's extremely impractical and needs to be worked around constantly.

JdeBP · on June 21, 2018

This, and the security problems of executable code as configuration, are why the OpenBSD people mandate that /etc/rc.conf is not general-purpose shell script, and why the systemd people mandate that /etc/os-release is similarly not. People want to be able to parse configuration files like this with something other than fully-fledged shell language interpreters; and they want these things to not be vectors for command injections.

* https://unix.stackexchange.com/a/433245/5132

LaGrange · on June 21, 2018

Settings.py is uniquely bad, though, IMO because it tries to be a badly defined dict(), instead of exposing proper configuration interfaces. Ruby config files are common and usually fairly great, see for example the Vagrantfiles.

And you won't have to generate your config files (parsing, maaaaaybe), because those needs are covered by the fact that the files are programs. They are _already_ generating a configuration.

derefr · on June 21, 2018

> And you won't have to generate your config files (parsing, maaaaaybe), because those needs are covered by the fact that the files are programs. They are _already_ generating a configuration.

Yes, theoretically, if settings.py was a "generator" format that you ran as a pre-step (like you do to get parser-generators like Bison to spit out source files for you to work with), and this generator actually spat out something like a settings.json, and all the rest of the infrastructure actually dealt with the settings.json rather than the generator, then, yes, it wouldn't matter. Tools in other languages could just generate the settings.json directly.

As it stands, none of those things are true, so tools in other languages actually need to do something that outputs settings.py files.

LaGrange · on June 21, 2018

Galaxy brain: if your config is programmable, it can read whatever terrible configuration format you want. That means my settings.py (yes, I'm forced to use Django) is configured via environment, which is populated by k8s from - gasp - JSON files.

That means that if I wanted to configure Vagrant with JSON, there is no force in the universe that could stop me.

If the config file is actually a normal program, then it can do normal program things, then any benefit from using JSON instead is nullified by the fact that you can still use JSON. In turn, if your tools primary configuration is via a more limited settings, you're stuck with it. Not even "generators in other languages" allow comparable runtime flexibility.

piva00 · on June 21, 2018

Yup, totally agree with you, settings.py has always been a pain in the ass. Not really an acute one but the kind that is uncomfortable but not enough to make you do something about it.

dotancohen · on June 21, 2018

> There's no language that I'm aware of that can natively generate PHP syntax.

Actually, I've had to use PHP to output a PHP configuration array for a project that required config in PHP.

`var_export($foo)` will output valid PHP code for creating the array $foo. In my case I was doing horrible things to create the array in my pseudo-makefile, then using `var_export()` to output the result. Note that you can run php from the Bash CLI with the `-r` flag, which helps.

kevin_thibedeau · on June 21, 2018

Tcl works well for configuration files. You can strip away the extraneous commands in a sub-interpreter to prevent Turing completeness and add infix assignment to remove the monotony of the set command and what you get is a nice config format. If you need more power in the future you just relax some of the restrictions and use it as a script without breaking existing files.

Jach · on June 20, 2018

People get really upset when they have to type "array(" instead of "[" or "{" (pre-PHP 5.something) and quotes instead of no quotes (and punting the character escape problem to something else) I guess.

Using code-as-data works really well in Lisp-like languages. Reading a Clojure project's project.clj file or a Lisp project's project.asdf file is pretty pleasant. A programming language's choice in how it decides to handle library config info for building and specifying dependencies (XML, makefiles, JSON, YAML, INI, nothing, etc...) will be a good indicator for the culture of the language around config files in general. Composer for PHP only came out in 2012.

akavel · on June 20, 2018

Interestingly, the Lua programming language actually evolved from configuration files: https://www.lua.org/history.html (and is still officially deemed useful for writing them)

spc476 · on June 21, 2018

I use Lua for configuration files for both personal and work related projects [1]. You get comments and the ability to construct strings piecemeal (DRY and all that). It's easy to sandbox the environment, and while you can't protect against everything (basically, a configuration script can go into an infinite loop), if someone unauthorized does have access to the script, you have bigger things to worry about.

[1] An example: https://github.com/spc476/mod_blog/blob/master/journal/blog....

dottrap · on June 21, 2018

You can set a count hook to defend against infinite loops.

Lua is great.

DonHopkins · on June 20, 2018

That was also one of the rationales behind TCL's design.

John Ousterhout explained in one of his early TCL papers that, as a "Tool Command Language" like the shell but unlike Lisp, arguments were treated as quoted literals by default (presuming that to be the common case), so you don't have to put quotes around most strings, and you have to use punctuation like ${}[] to evaluate expressions.

TCL's syntax is optimized for calling functions with literal parameters to create and configure objects, like a declarative configuration file. And it's often used that way with Tk to create and configure a bunch of user interface widgets.

Oliver Steel has written some interesting stuff about "Instance-First Development" and how it applies to the XML/JavaScript based OpenLaszlo programming language, and other prototype based languages.

Instance-First Development: https://blog.osteele.com/2004/03/classes-and-prototypes/

>The equivalence between the two programs above supports a development strategy I call instance-first development. In instance-first development, one implements functionality for a single instance, and then refactors the instance into a class that supports multiple instances.

>[...] In defining the semantics of LZX class definitions, I found the following principle useful:

>Instance substitution principal: An instance of a class can be replaced by the definition of the instance, without changing the program semantics.

In OpenLaszlo, you can create trees of nested instances with XML tags, and when you define a class, its name becomes an XML tag you can use to create instances of that class.

That lets you create your own domain specific declarative XML languages for creating and configuring objects (using constraint expressions and XML data binding, which makes it very powerful).

The syntax for creating a bunch of objects is parallel to the syntax of declaring a class that creates the same objects.

So you can start by just creating a bunch of stuff in "instance space", then later on as you see the need, easily and incrementally convert only the parts of it you want to reuse and abstract into classes.

What is OpenLaszlo, and what's it good for? http://www.donhopkins.com/drupal/node/124

Constraints and Prototypes in Garnet and Laszlo: http://www.donhopkins.com/drupal/node/69

pjmlp · on June 22, 2018

In our Tcl based application server (many eons ago), we followed exactly that approach.

All configuration files were Tcl data structures that were sourced on server start.

krapp · on June 20, 2018

>I don't understand why the popular dynamic script-y languages don't more commonly use the natively-expressable associative/list data structures that they're famous for making convenient.

You picked the wrong language... PHP comes with its own JSON parser. And INI and XML and even CSV.

But, the reason is that, generally, you want config files to describe data or state only. Yes, you could just make your config native code, but then the temptation to add functions and methods and logic to that becomes irresistible and soon your config is an application that needs its own config.

Config formats need to be simple, and preferably not Turing complete.

user5994461 · on June 20, 2018

Any configuration will eventually become a programming language.

See The Configuration Complexity Clock. https://mikehadlow.blogspot.com/2012/05/configuration-comple...

krapp · on June 21, 2018

Just because it's common doesn't mean it should be encouraged by writing your config in native code to begin with.

INI is still simple, and JSON doesn't support logic, so the madness can be held at bay at least for a time.

XML and s-expressions are lost causes, though.

zmmmmm · on June 20, 2018

Because it's just in general incredibly short sighted to think that your config file is never going to be read by code written in another language.

There's also an argument about whether making configuration files able to execute arbitrary code is a good idea. You get straight into the JavaScript 'eval' problems which we've spent a decade escaping.

dcbadacd · on June 21, 2018

Arbitrary code execution in configuration files has caused a few vulnerabilities in Wordpress extensions already, so yes, it's a terrible idea.

ramses0 · on June 20, 2018

I think some of it is PLOP (Principle of Least Power).

$CFG = random() > 0.5 ? "yes" : "no";

...is likely "too powerful". It'd be nice if there were ways in certain programming languages to do something like "drop privileges" to avoid loops, function calls, external access, etc.

sillywabbit · on June 20, 2018

The makers of Drush, the cli for Drupal, subscribed to your line of thinking in the early versions and inventory items were defined in PHP files. Migrating from that will be interesting.

Annatar · on June 21, 2018

Because that forces the end user, who might not know anything about the programming language one’s application is written in to wrestle with the low level implementation details. In the words of Keith Wesolowski, the programmer assumes that the end user is a “Linux superengineer”, which is almost always a wrong assumption to make.

_ugfj · on June 21, 2018

I have linked this https://groups.drupal.org/node/159044 elsewhere. Please note PHP was considered.

pjmlp · on June 22, 2018

Fully agree with you, I have done so multiple times.

DonHopkins · on June 20, 2018

I totally agree that in the ideal world, JSON should support comments. I yearn for them, and none of the in-band work-arounds or post-processing tools are acceptable substitutes.

But to play the devil's advocate, how would JSON be able to support round-tripping comments like XML can, since  are part of the DOM model that you can read and write, while JSON // and /* comments */ are invisible to JavaScript programs. There's nowhere to store the comments in the JSON object model, which you would need to be able to write them back out later!

On important feature of JSON is being able to read and write JSON files with full fidelity and not lose any information like comments. XML can do that, but JSON can't. To fix that you'd have to go back and redesign (and vastly complicate) fundamental JavaScript objects and arrays and values, to be as complex and byzantine as the DOM API.

The less-than-ideal situation we're in isn't JSON's fault or JavaScript's fault, because JSON is just a post-hoc formalization of something that was designed for a different purpose. But JSON is rightly more popular than XML, because it's extremely simple, and nicely impedance matched with many popular languages.

YAML suffers from the same problem as JSON that it can't round-trip comments like XML can, but it fails to be as simple as JSON, is almost as complex as XML, and doesn't even map directly to many popular languages (as the article points out, you can't use a list as a dict key in Python, PHP, JavaScript, or Go, etc).

You can sidestep some of JSON's problems by representing JSON as outlines and tables in spreadsheets, without any need for syntax and sigils like brackets, braces, commas, no commas, quoting, escaping, tabs, spaces, etc, but in a way that supports rich formatted comments and content (you can even paste pictures and live charts into most spreadsheets if you like), and even dynamic transformations with spreadsheet expressions and JavaScript.

See my comments about that in this and another article: https://news.ycombinator.com/item?id=17360071 https://news.ycombinator.com/item?id=17309132

Mikhail_Edoshin · on June 22, 2018

> "YAML ... is almost as complex as XML"

In fact YAML is probably more complex than XML; the specification of YAML, when I print it into PDF, is about three times as long as that of XML 1.0. (And XML 1.0 also describes DTD, which is kind of a simple type validation for XML and thus includes much more than just serialization syntax.)

int_19h · on June 21, 2018

> But to play the devil's advocate, how would JSON be able to support round-tripping comments like XML can, since  are part of the DOM model that you can read and write, while JSON // and /* comments */ are invisible to JavaScript programs.

It doesn't support it for whitespace in general (if you deserialize into JS object model or equivalent), so why would it be any different for comments specifically? It's just not a design goal of the format.

Although, of course, it's quite possible to have a JSON parser that preserves representation. It'll just have a non-obvious mapping to the host language because of all the comment and whitespace nodes etc.

tlocke · on June 21, 2018

Zish https://github.com/tlocke/zish supports comments (but they're not round-trip comments) and also extra data types such as timestamp and bytes.

It's still in its early stages, so if anyone's got any comments I'm interested in hearing them :-)

aepiepaey · on June 21, 2018

> YAML suffers from the same problem as JSON that it can't round-trip comments like XML can, [...]

While not mandated by the YAML specification, it doesn't prevent creation of a parser that round-trips comments.

In fact, the ruamel.yaml project for Python provides one.

rhacker · on June 21, 2018

I'm kinda sad that JSON has been struggling for like 15 years to get comments. Is there like some kind of gestapo that's saying no or something? All it takes is for the maintainers of probably 15 popular libraries to start handling comments.

At the end of the day I'm sure the reason we don't have JSON comments is somewhere listed in this page: xkcd.com/927/

JonathonW · on June 21, 2018

I'm aware of at least three JSON libraries that at least can accept comments (Gson in lenient mode, Json.NET, and json-cpp are the ones I've used personally that do)-- it's hard to convince everyone that JSON needs comments, though, and comments are of limited utility if it's not guaranteed that they'll parse everywhere.

But you really only need comments in JSON if you're doing stuff like storing configuration in JSON, and JSON's too fiddly in general to be a great config file format (too easy to do something like forget a comma; no support for types beyond object, array, (floating point) number, and string). Something more like YAML without the wonky type inference would be better, IMO.

ajmurmann · on June 21, 2018

I believe Douglas Crockforf used to make the argument that JSON is not meant for human consumption and thus shouldn't be changed to better serve humans. I personally wish hjson (https://hjson.org) were to get more traction. I prefer it over both JSON and YAML.

jasonjayr · on June 21, 2018

It was Crockford. Directly from him:

> I removed comments from JSON because I saw people were using them to hold parsing directives, a practice which would have destroyed interoperability. I know that the lack of comments makes some people sad, but it shouldn't.

https://plus.google.com/+DouglasCrockfordEsq/posts/RK8qyGVaG...

Mikhail_Edoshin · on June 22, 2018

Well, then why not to allow a trailing comma in lists and objects? Computers don't care and they would even be happier, because they can then just pour array and object members with a trailing comma without concerning themselves whether this is the last member or not. (Dijkstra's train toilet problem comes to mind.) Also compare with XML, where each element is self-contained.

And why to model JSON syntax closely after JavaScript literal object syntax (which is actually more convenient, by the way) which, being taken from mainstream programming languages, naturally evolved to be written by humans in small amounts not by computers in large dumps? :)

skybrian · on June 20, 2018

VS Code uses JSON with comments for config files. [1]

Technically, this is not JSON. You won't be able to use a standard JSON parser without stripping comments first. But you can use a simple, JSON-like language with comments for config.

[1] https://code.visualstudio.com/docs/languages/json#_json-with...

EamonnMR · on June 20, 2018

> you can use a simple, JSON-like language with comments for config.

YAML can be employed as a simple JSON-like language with comments.

crooked-v · on June 20, 2018

> simple

YAML is much, much more complicated than JSON.

hk__2 · on June 20, 2018

    > simple
    YAML is much, much more complicated than JSON.

Quoting a single word from the parent’s sentence is misleading. The sentence "YAML can be employed as a simple JSON-like language with comments." is true because JSON is YAML, so you can parse a JSON file with #-comments using a YAML parser.

skybrian · on June 20, 2018

The parser is not simple, though, and that's what counts.

bdhess · on June 21, 2018

Most YAML users don't need to look at the source for a YAML parser. I appreciate elegant simplicity, but I don't think parser complexity is the most important metric by which to judge a data interchange format.

skybrian · on June 21, 2018

If you use a YAML parser to parse JSON-with-comments, it will accept many inputs that don't correspond to JSON-with-comments, and furthermore is likely to report syntax errors that don't make sense to a user who only knows JSON.

So, this unnecessary parser complexity is a usability issue. You should use a parser for the config language you actually intend to support.

spankalee · on June 21, 2018

JSON5 is also a great alternative: https://json5.org/

Supports comments, trailing commas, single quotes, multi-line strings, and more number formats.

nojvek · on June 21, 2018

I really wished json5 would support optional commas as well. If you have a new line, no comma needed. So you can do

[

1

2

3

]

New lines used by humans, computers should do a good job as well.

adamkruszewski · on June 21, 2018

It looks very like an s-expression. Maybe we should go back to Lisp for our data encoding? (and to our code when we are at it? ;))

Sean1708 · on June 21, 2018

Like https://hjson.org/?

spiralx · on June 22, 2018

Well you could use CSON, which uses CoffeeScript notation that allows for constructs such as:

required: [

  'firstName'

  'lastName'

]

which is the same as:

{

  "required": [

    "firstName",

    "lastName"

  ]

}

in JSON :)

spankalee · on June 21, 2018

That's not a subset of JavaScript though.

craigds · on June 21, 2018

Sublime Text does too.

pacala · on June 21, 2018

Try https://jsonnet.org/. Supports comments, plus a handful of additional useful features.

nojvek · on June 21, 2018

Jsonnet is awesome. We use it to generate our yaml files for kubernetes. YAML isn’t easy to parse, nor is it very flexible as a templating language. It gets cumbersome very quickly.

Jsonnet is a relief. Kubernetes should have been a dumb json config from the get go. JSON is ridiculously simple to parse and emit. It has huge interoperability as well with lots of programming languages.

pests · on June 21, 2018

Kubenetes never intended to get stuck on YAML. The CNCF is backing Ksonnet which is Jsonnet for k8s if you haven't seen it before.

alessioalex · on June 21, 2018

Oh man, you just made my day. Comments, imports and mixins, I just had an 'evrika' moment.

P.S. You Romanian by any chance?

merb · on June 20, 2018

well there is also toml and hocon (json supersets) which are "yaml like"

benatkin · on June 20, 2018

TOML is more like ini than YAML.

I don't like it because it uses the = symbol which seems imperative rather than declarative. (Same with HCL, it might be a nitpick but these are languages I'm going to be using all the time.)

HOCON is interesting but at first glance it seems it might be too ambiguous for my tastes, because like YAML, because it supports both js-style ("//") and shell-style ("#") comments.

JSON plus comments is beautiful because it adds minimally to an unambiguous language which lends itself to automatic formatting (stringificiation).

colatkinson · on June 20, 2018

I'd argue that = only feels imperative if you're used to imperative languages. Prolog and Haskell, both of which focus on being declarative, also both use the equals sign.

benatkin · on June 20, 2018

Fair enough. It still seems overkill to me though. A ":" seems much more unassuming than an "=".

user5994461 · on June 20, 2018

Don't forget HJSON if you want a json clone with comments. https://hjson.org/

_ugfj · on June 20, 2018

TOM: Initial release 23 February 2013; 5 years ago

Drupal 8 file format discussion was in 2011, predating it by two years. https://groups.drupal.org/node/159044

kpil · on June 21, 2018

Is there a yaml parser that preserves comments and a writer that manages to write them back though?

TimZehta · on June 21, 2018

The parser linked from the article does: https://github.com/crdoconnor/strictyaml

jiveturkey · on June 20, 2018

> JSON doesn't support comments

eh?

{ "firstName": "John", "lastName": "Smith", "comment": "foo", }

I know it isn't the same as #comments, but who cares really.

Smaug123 · on June 20, 2018

The trouble there is that your comments come in-band. What if you're trying to serialise something and you don't have the power to insist that it's not a dictionary with "comment" as a key?

hexane360 · on June 20, 2018

It seems the main difference is your comments are all parsed and loaded into memory with the file, while official comments aren't.

dharmab · on June 20, 2018

How do I do something like:

{ # comment with a note about the value of foo "foo": "bar", # comment with a note about the value of baz "baz": "qux" }

Without driving myself and future readers insane with fooComments and bazComments?

What if I need a multiline comment explaining a yak-shaving story for why a key is set to a certain value?

What if the object in question is a set of keyword arguments, and adding new fields changes the behavior of whaever is parsing the document?

mottosso · on June 20, 2018

Ok, I'll bite.

  {
    "#": "A foo variable",
    "foo": true,

    "#": "A bar variable",
    "bar": false
  }

Alternatively.

  {
    "# A foo variable": "",
    "foo": true,

    "# A multiline..": "",
    "# .. bar variable": "",
    "bar": false
  }

Presto!

_ugfj · on June 20, 2018

Sigh. All I wanted to do is to say thanks for the YAML standard -- comments are important but not the only problem with JSON. And truly I can't be expected to remember all of this discussion from like six plus years ago. One thing I remember though, it the trailing comma problem -- we upstreamed a grammar change to Doctrine annotation so "foo, bar," is OK because PHP arrays accept that and it's bonkers trying to code a mostly PHP system without trailing comma support. Also, JSON is no fun to write , you need to have [] {} all correct where YAML is much easier. The less sigils the better and most of Drupal YAMLs only use the dash, the colon and the quote. This is the grave mistake Doctrine committed as well, instead of simple arithmetic (>=1.0) they used mysterious sigils in version specification (~1.0). Drupal is in the business of constantly accepting new contributors and (~R∊R∘.×R)/R←1↓ιR is not newbie friendly, no matter how you slice and dice it. There are certainly advantages of sigil heavy languages like APL and Perl but the scare factor is too high.

jiveturkey · on June 20, 2018

fair enough, but then you probably shouldn't have led off your earlier comment with "that simple".

goatlover · on June 20, 2018

That's just ugly and you're mixing your comments with the data structure, which is potentially confusing. Also, Jason requires a lot more typing. I don't want to have to manually add in all the brackets, quotes and commas when editing config a file.

mypalmike · on June 21, 2018

Presto! You have a duplicate key in the first example.

Also...

  print (json.dumps(json.loads(js_data), indent=2))

  {
    "bar": false,
    "foo": true,
    "# A multiline..": "",
    "# .. bar variable": "",
    "# A foo variable": ""
  }

Presto! ;-)

trynewideas · on June 20, 2018

> who cares really

the person who came up with HOCON, probably

bhaak · on June 20, 2018

> JSON didn't exist (YAML is a parallel, contemporary effort).

Interesting. How did it happen then that, quoting the YAML 1.2 spec, that "every JSON file is also a valid YAML file"? Although the previous spec documents don't mention JSON.

Was that an intentional design decision for 1.2 or was it some kind of convergent design due to Javascript?

clarkevans · on June 20, 2018

I have admired Douglas Crawford's excellent JSON from the moment I saw it, it is a model of simplicity. I also like TOML and wish it all the best. By contrast, YAML is complex and could use a hair cut.

When I say "JSON didn't exist", what I mean is that it wasn't popular or known to us when we were working on YAML. So, please excuse my sloppy wording. For me, the work on what would become YAML started with a few of us in 1999 (from SML-DEV list). In January of 2001 we picked the name and had early releases. It took a few years of iteration before we had a specification the collaborators (Perl, Python, and Ruby) could all bless.

Anyway, with regard to Crawford's excellent work, JSON. It is a coincidence that YAML's in-line format happened to align. Although, it's probably because of a "C" ancestor, not JavaScript. The main influence on the YAML syntax was RFC0822 (e-mail), only that from my perspective, it needed to be a typed graph. In fact, we documented where we stole ideas from, to the best we could recall at that time: http://yaml.org/spec/1.0/#id2488920.

crdoconnor · on June 20, 2018

>YAML is complex and could use a hair cut.

Out of curiosity, did you see the parser linked to at the end of the article? ( https://github.com/crdoconnor/strictyaml )

That was my attempt at giving YAML a haircut. I'd be curious to know what you thought.

Thank you for creating YAML, by the way. Even though part of that rant was quoted from me, I'm not negative on it like the author - I think the core was brilliantly designed. If you put two hierarchical documents side by side - one in TOML and another in YAML the YAML one is much, much clearer and cleaner.

clarkevans · on June 20, 2018

Thank you for StrictYAML I might just use it. It does look like a nice hair cut. You might wish to give Ingy a ring. He has been itching to move forward on a reduced/secure YAML subset.

That said, StrictYAML seems to be a tad bit more of a hair cut than I'd imagine. I'd keep nodes/anchors, since I think a graph storage model is underrated; I think that data processing techniques just haven't caught up with graph structures.

Further, I'm not sure everything can be easily typed based upon a schema. Hence, I'm not sure about completely dropping implicit types, perhaps you may want to provide a way for applications to resolve them if they wish. For example, an application may want to attempt to treat anything starting with "[" or "{" as JSON sub-tree. Perhaps keeping "!tag" but handing it off to the application to resolve might also be a good idea in this regard. Even so, typing should be done at the application level and default to something very boring.

crdoconnor · on June 20, 2018

>Thanks for StrictYAML, I might just use it.

Thanks, that's very flattering.

> I'd keep nodes/anchors, since I think a graph model is underrated

Well, you can create graph models without it (and I do) - you can just use string identifiers to identify nodes and let the application decide what that means.

I always thought the intent behind nodes/anchors was not so much graph models but rather to take repetitive YAML and make it DRY. That appears to be how it is used, e.g. in gitlab's ci YAML.

>I'm not sure about completely dropping implicit types, perhaps you may want to provide a way for applications to resolve them if they wish. For example, an application may want to attempt to treat anything starting with [ or { as JSON.

I think that would cause surprise type conversions. There will be plenty of times when you want something to start with a [ or { and you won't want it parsed as JSON.

I embed snippets of JSON in YAML multiline strings sometimes and I usually just parse it directly as a string. Then I run that string through a JSON parser elsewhere in the code.

>You might wish to give Ingy a ring.

I would like that.

clarkevans · on June 20, 2018

> I think that would cause surprise type conversions.

YAML has traditionally been used as the basis of higher-level configuration files for particular applications. What I'm saying is that implicit typing should be permitted, but delegated to those applications.

Conversely, I'm not saying that StrictYAML should do anything by default with unquoted values, except reporting them to the application as being an unquoted value. This way the application could choose to process the value differently from those that are quoted.

patrec · on June 21, 2018

An interesting idea, but it's not clear that this will be less confusing or that application authors will make better at avoiding config languages gotchas than config language designers such as yourself (and existing app specific config languages suggest otherwise).

I think a reason this won't necessarily fix the problem with unmet expectations is that identical constructs in different but analogous yaml files would be likely to end up with very different semantics and users effectively have to remember which particular idiosyncratic YAML dialect choices various apps make. Say

   version: 1.3

means the string "1.3" in app a), the float 1.3 in app b) and a version number in app c) one. Furthermore let's assume that app c) required a version number, whereas a) and b) required strings.

Another, more subtle problem, is that such a scheme would make it more likely that applications would end up parsing raw string representations themselves (with ensuing subtle differences even for things which are nominally meant to be identical, say dates or numbers and possibly security problems as well).

daveFNbuck · on June 21, 2018

> I always thought the intent behind nodes/anchors was not so much graph models but rather to take repetitive YAML and make it DRY. That appears to be how it is used, e.g. in gitlab's ci YAML.

That's how I use it too. When I read about competing formats, that's the first feature I check for. It's really key for readability and usability in some use cases.

veli_joza · on June 21, 2018

Great to have you here elaborating on various design choices. Are you perhaps familiar with OGDL [1] and what's your opinion?

[1] http://ogdl.org/spec

clarkevans · on June 21, 2018

I don't have much to suggest. For YAML, the use of whitespace, colons and dashes primarily emerged from usability testing with domain experts who are not programmers. In particular, testing was done in the context of an application that needed a configuration and data auditing interface, an accounting application. Even anchors/aliases worked in this context and supported the application's use by making the audit records less repetitive without introducing artificial handles.

Other use cases such as dumping any in-memory data structure from memory, perhaps out of a sense that we needed full completeness, actually didn't have any end-user usability testing. Round-tripping data seems in retrospect to be a diversion from the primary value that YAML provided.

freedomben · on June 20, 2018

Is there an implementation of strict yaml that you know of for Ruby?

bmurphy1976 · on June 20, 2018

If you are writing a new YAML implementation, then yeah, you want a simpler spec to follow.

If on the other hand you are using a YAML library... I've had pretty good success using YAML compatibly across Python, Ruby, C# and Go projects. Do you have a particular issue in mind that the existing Ruby implementation doesn't address?

dillnumber0 · on June 21, 2018

It's an implementation of YAML, not StrictYAML which has different semantics.

freedomben · on June 21, 2018

Yes, strict YAML is different than YAML. If you take a look at the github page linked in the GP, it explains the differences.

ramses0 · on June 21, 2018

"JSON didn't exist because Us and We"?

YAML is an invented serialization format, JSON is a discovered one. As CrOCKford points out, JSON existed as long as JS existed, he just called it out and put a name on it.

Anyway, XML is a strong anti-pattern (too much security, even if you get it right on your end, the other party likely screwed something up). YAML seems to be going down that path too.

TOML seems to be "the JSON of *.ini" (ie: discovering old conventions, rather than inventing new ones), and I'm glad to have been exposed to it.

clarkevans · on June 21, 2018

> "JSON didn't exist because Us and We"?

If you define JSON as the underlying practice that Crawford later named and documented, then sure, what I wrote reads completely wrong headed. However, when we were working on YAML, JSON was not yet called out and given a name.

I believe the most important convention that YAML and JSON shared was a recognition of the typed map/list/scalar model used by modern languages. Further, as far as conventions go, I think there's quite a bit to be said about languages that use light-weight structural markers such as: indentation, colon and dash.

ngrilly · on June 20, 2018

The answer is in version 1.2 of the spec:

> The primary objective of this revision is to bring YAML into compliance with JSON as an official subset.

woah · on June 20, 2018

It's not really a moral judgement, thanks for your contributions and your innovations, but I prefer not to use YAML if possible for the same reasons the author outlined.

ngrilly · on June 20, 2018

I didn't know this bit of history. You're right, context explains a lot of the design choices made at YAML birth. Thanks for sharing.

jbverschoor · on June 21, 2018

"JSON" became popular in the 90s. They were http requests which returned javascript which you would simply eval(). No need to write or import a parser, and it is the same syntax as the language you're using, because it is the same language. In technology many things become popular not because how good (or bad) things are, but how easy to use something is.

JTbane · on June 21, 2018

Can't agree more. In tech, the prime mover often becomes the standard.

redsymbol · on June 21, 2018

Clark, thanks so much for YAML. I love it and use it a lot. It actually increases the day-to-day joy of the work I do as a developer.

(While constructive criticism is fine, those rare people who trash it are... nonsensical to me. I'd like to see them do one-tenth as good under the same conditions!)

winkeltripel · on June 21, 2018

Unity3D uses YAML for it's serialization engine. Thank you.

jv22222 · on June 20, 2018

I love YAML, thanks for creating it, it's saved me a lot of time over the years.

homero · on June 21, 2018

I love json but despise the fact it doesn't support comments

ebikelaw · on June 20, 2018

JSON definitely did exist 20 years ago.

sk5t · on June 20, 2018

JavaScript objects, yes, but not JSON. Folks were deep into XML as a message format.

xnxn · on June 20, 2018

> Douglas Crockford originally specified the JSON format in the early 2000s

frou_dh · on June 20, 2018

> I discovered JSON. I do not claim to have invented JSON because it already existed in nature. What I did was I found it, I named it, I described how it was useful. I don’t claim to be the first person to have discovered it. I know that there are other people who discovered it, at least, a year before I did. The earliest occurrence I found was there was someone at Netscape who was using JavaScript array literals for doing data communication as early as 1996, which was at least 5 years before I stumbled onto the idea.

https://www.youtube.com/watch?v=-C-JoyNuQJs

btilly · on June 21, 2018

I can independently confirm that people were using JSON before he named it JSON. I was dumping data in JSON in 2000 for dynamically displayed reports.

But then again I was already used to using Perl data structures as dumped by Data::Dumper for config, because I was taught a lot about Perl by a Lisp programmer who had used Lisp data structures for the same purpose since the 1980s. So using JSON didn't feel original or clever. It seemed like I was simply using a well-known technique in yet another dynamic language.

Then again our reaction to XML was the stupid thing other people were doing that you had to do to interact with the rest of the world. I got used to holding my tongue until I went to Google a decade later and found that my attitude was common wisdom there...

xnxn · on June 20, 2018

According to Platonism, JSON has no spatiotemporal or causal properties (like a datetime format) and thus has existed and will exist eternally. All hail JSON.

irispanabaker · on June 25, 2018

I have used all the principle of JSON and developed https://jsonformatter.org

allanbreyes · on June 20, 2018

I'd like to propose the "YAML-NOrway Law."

"Anyone who uses YAML long enough will eventually get burned when attempting to abbreviate Norway."

Example:

  NI: Nicaragua
  NL: Netherlands
  NO: Norway # boom!

`NO` is parsed as a boolean type, which with the YAML 1.1 spec, there are 22 options to write "true" or "false."[1] For that example, you have wrap "NO" in quotes to get the expected result.

This, along with many of the design decisions in YAML strike me as a simple vs. easy[2] tradeoff, where the authors opted for "easy," at the expense of simplicity. I (and I assume others) mostly use YAML for configuration. I need my config files to be dead simple, explicit, and predictable. Easy can take a back seat.

[1]: http://yaml.org/type/bool.html [2]: https://www.infoq.com/presentations/Simple-Made-Easy

clarkevans · on June 20, 2018

The implicit typing rules (ie, unquoted values) should have been application dependent. We debated this when we got started and I thought there was no "right" answer. Alas, Ingy was correct and I was wrong.

allanbreyes · on June 20, 2018

I appreciate your humility and professionalism in a discussion thread that holds a lot of criticism; suffice it to say, I should have practiced a bit more humility and a bit less "Monday morning quarterbacking" in my original post. And I should have read your comment on YAML's history. To right the record: you got _so_ much right with YAML, and it's unfair for me to cherry-pick this example 20 years later. Sincere apologies...

As the saying goes, "there are only two kinds of languages: the ones people complain about and the ones nobody uses." YAML, like any language, isn't perfect, but it's withheld the test of time and is used by software around the world—many have found it incredibly useful. Sincere thanks for your contribution and work.

Retra · on June 21, 2018

As someone who doesn't really use YAML much, your comment provides a good introduction to the kinds of things one needs to know before choosing formats in the future.

lomnakkus · on June 20, 2018

This is a very good example of the problems of YAML and it's one of those things that has really preplexed me about the design of YAML. (I suppose it's a sign of the times when YAML was designed.)

It's[1] just so blatantly unnecessary to support any file encoding other than UTF-8, supporting "extensible data types" which sometimes end up being attack vectors into a language runtime's serialization mechanism, autodetecting the types of values... the list goes on and on. Aside from the ergonomic issues of reading/writing YAML files, it's also absurdly complex to support all of YAML's features... which are used in <1% of YAML files.

A well-designed replacement for certain uses might be Dhall, but I'm not holding my breath for that to gain any widespread acceptance.

[1] Present tense. Things looked massively different at the time, so it's pretty unfair to second-guess the designers of YAML.

aldanor · on June 20, 2018

This was fixed in YAML 1.2 though? So, e.g., in Python you'd just use ruamel.yaml instead of pyyaml.

That doesn't help you, of course, when using a multitude of existing systems whose yaml parsers are based on 1.1...

bmurphy1976 · on June 20, 2018

I've been bit by the string made out of digits and starts with 0 thing a couple times. In this case it gets interpreted as a number and drops leading zeroes. I quickly learned to quote all my strings.

I'd still love for a better means to resolve ambiguities like this, but I've found always quoting to be a fairly reliable approach.

bryanlarsen · on June 20, 2018

A thread hating on YAML without a mention of the bastardized YAML that ansible uses?

Ansible extends yaml so that:

cmd: a b c

is actually but not quite identical to:

cmd: ["a", "b", "c"]

It also embeds JINJA2 templating part-way (!) through the YAML parsing process.

The gotchas that these and other bastardizations cause is only partially documented at the bottom of this page: https://docs.ansible.com/ansible/latest/reference_appendices...

I like ansible, but its decision to use a bastardized YAML is a major pet peeve of mine.

rocmcd · on June 20, 2018

I'm not an expert by any means, but I'm pretty sure that Ansible uses vanilla YAML (no 'bastardization').

Your first example is an Ansible convenience feature, it's not extending or changing the YAML syntax in any way. You can simply specify `cmd` values as lists or strings, since working with one or the other may be easier depending on the use case.

The templating is unfortunate in some areas, especially where the jinja2 syntax conflicts with what YAML expects (for example starting an object with '{'). That's due to a combination of templating engine choice and YAML, though, and not some custom implementation of YAML. Unless I'm misunderstanding?

I do think going with YAML was a trade-off for Ansible, but it's hard to see Ansible getting to where it is today if it had gone with a custom DSL (or JSON, thank god). I'd take Ansible's YAML over Chef's Ruby or CloudFormation's JSON any day.

jwilk · on June 20, 2018

Another example of YAML-but-not-quite is Travis CI configuration format:

https://github.com/travis-ci/travis-yml#user-content-yaml

cpburns2009 · on June 20, 2018

Oh god, Ansible is exactly why I don't like YAML or Jinja2. I never know what needs to be quoted, what's inlined, what needs to be wrapped in "{{ }}", and what expressions are supported. But once you get the syntax right, it works great.

mkobit · on June 20, 2018

SaltStack also has the JINJA2 template embedding which can make it very difficult to understand which parts of the lifecycle run through templating. I'm still not certain I understand how it works.

The most recent offenders for bastardizing YAML I have seen are the different CI services:

* Circle CI using moustache-like templating and interpolation with things like {{ .Branch }} available in certain steps [1]

* GitLab CI adding an "include" type directive to declare YAML dependencies [2]

I've also experienced this professionally. At my last company, somebody decided to add a feature to enable interpolation in some parts of the YAML deployment data. It ended up being used by a handful of people who were confused why interpolation worked in some places and not others. The weird trend of "extending YAML" seems to be going against any sort of benefits you might have by trying to use it.

[1]: https://circleci.com/docs/2.0/configuration-reference/#save_...

[2]: https://docs.gitlab.com/ee/ci/yaml/#include

crooked-v · on June 20, 2018

Ansible would be so, so much better if it just used plain JSON, or even JS with an implied context for variables, .eslintrc.js style.

geerlingguy · on June 20, 2018

You can usually use plain old JSON anywhere where YAML would be used (e.g. host vars, group vars, vars file includes, I think even playbooks). And internally, most everything in Ansible is JSON anyways.

YAML is for convenience for hand-editing configuration/task files; if you're doing anything that doesn't require hand editing/readability, use JSON.

koffiezet · on June 21, 2018

You can, but you lose the possibility of comments, and writing JSON by hand is also a pain in the ass...

Alex3917 · on June 20, 2018

With YAML I can never remember what's an object versus a list, string, or number, nor am I ever able to add new stuff to a YAML file and get it to parse correctly without first looking up the spec. And it's impossible to see where large objects start and end.

In contrast, JSON is super intuitive and basically self documenting. The only real quirks are that you need to use double quotes, and objects can't have a trailing comma.

The only good thing I can see about YAML is that it's super easy to convert and re-export to JSON.

mschuster91 · on June 20, 2018

> In contrast, JSON is super intuitive and basically self documenting. The only real quirks are that you need to use double quotes, and objects can't have a trailing comma.

I'd expand the list of quirks... JSON lacks comments (both line-level and block level). Fine for data transport but super super bad for configuration files.

sho · on June 20, 2018

> lacks comments [..] super super bad for configuration files

Not that that matters when applications take it upon themselves to re-save the config file in some kind of normalisation effort. Bye-bye comments, hope they're checked in somewhere.

I'm lookin' at you, kubernetes...

lostctown · on June 20, 2018

This hits home. Every time I've ever had to make the decision I've chosen yaml, for exactly this reason. Funny how the seemingly small things can be absolute show stoppers when it comes to making decisions in production.

tlb · on June 20, 2018

JSON5 supports comments, and is only slightly more complex than JSON. https://json5.org/

saghm · on June 20, 2018

The barrier here would be whether there's support in enough implementations to feel safe using it in the wild, which I'm guessing will take a while at the very least.

tajen · on June 20, 2018

What’s incredible here is that we’re not at the beginning of programming, when we built temporary languages that ended up becoming forgotten. Web may be a final form of IT, Angular may be the « right », the final, the perfect way to build applications even in 50 years, just like HTML has become the final way to build websites for the last 25 years, JSON may make legacy, and my grandson might even struggle with parsers that still use JSON instead of this new tech called JSON5...

webXL · on June 22, 2018

The popularity of transpilers might help overcome that barrier. IDE's and task runners can watch for file modifications and run a simple program to convert to the old format.

But I'm curious what you mean by "in the wild"? If you're using (producing) it, something needs to consume it, and you would probably have control over both in whatever project you were using it for.

emodendroket · on June 20, 2018

If you're going to do this kind of thing why would you not add a standard date format

continuational · on June 21, 2018

Just use a string and ISO 8601? “2018-03-25”

emodendroket · on June 21, 2018

Wow, I can "just" use that, thanks. The problem is that JSON is an interchange format, meaning that I need to implement this serialization and deserialization quirk on every producer/consumer of my API (which, you know, avoiding is kind of the point of using a standard interchange format). Furthermore, because everything is a string, I can't unambiguously indicate something is meant to be a string in that format rather than a date.

wildrhythms · on June 21, 2018

What's the timezone?

lysium · on June 21, 2018

Do you need a time zone for dates?

wildrhythms · on June 22, 2018

Consider if I'm storing a user's (local) birthday on my server:

    {..., "birthday": "2018-03-25"}

If my server is located in New York City, and the user is in Sydney, then my server isn't going to wish them happy birthday in time.

So maybe we could do:

    {..., "birthday": "2018-03-25", "location": "Sydney/AU"}

But at this point we might as well use a standardized time format (UTC) with a timezone offset. Maybe I'm thinking too far into it?

webXL · on June 22, 2018

ISO 8601 has you covered:

   Year:
      YYYY (eg 1997)
   Year and month:
      YYYY-MM (eg 1997-07)
   Complete date:
      YYYY-MM-DD (eg 1997-07-16)
   Complete date plus hours and minutes:
      YYYY-MM-DDThh:mmTZD (eg 1997-07-16T19:20+01:00)
   Complete date plus hours, minutes and seconds:
      YYYY-MM-DDThh:mm:ssTZD (eg 1997-07-16T19:20:30+01:00)
   Complete date plus hours, minutes, seconds and a decimal fraction of a second
      YYYY-MM-DDThh:mm:ss.sTZD (eg 1997-07-16T19:20:30.45+01:00)
  where:

     YYYY = four-digit year
     MM   = two-digit month (01=January, etc.)
     DD   = two-digit day of month (01 through 31)
     hh   = two digits of hour (00 through 23) (am/pm NOT allowed)
     mm   = two digits of minute (00 through 59)
     ss   = two digits of second (00 through 59)
     s    = one or more digits representing a decimal fraction of a second
     TZD  = time zone designator (Z or +hh:mm or -hh:mm)

outworlder · on June 20, 2018

Most parsers will not support JSON5.

nostalgeek · on June 20, 2018

More like browsers can de-serialize JSON5 natively. writing a JSON5 parser is not difficult. It's just not part of most std libs in most languages, but I would argue that most std libs don't parse YAML either.

JSON5 is a good compromise.

tlb · on June 21, 2018

Neither Chrome nor Firefox's JSON.parse accept JSON5. I'm not sure what browser API you mean.

For example,

  JSON.parse(`{"foo":"bar",}`)

throws a syntax error.

nostalgeek · on June 22, 2018

> Neither Chrome nor Firefox's JSON.parse accept JSON5. I'm not sure what browser API you mean.

This is a typo, I meant "can't" not "can". Off course browsers don't support JSON5 or my message makes no sense whatsoever.

crooked-v · on June 20, 2018

The lack of template string support is a weird part of that.

twodave · on June 20, 2018

In our JSON config files we do:

{ "ConfigKeyComment": "This is for blah blah blah", "ConfigKey": "Foo" }

Obviously this wouldn't work in all cases (you're putting more work on your parser to interpret unused keys basically), but if we're talking config files specifically, I see this as an acceptable approach since there's little chance you'll be parsing such files more than once each (plus, writing a simple tool to strip the comments out would be very trivial).

biztos · on June 20, 2018

I've tried something similar but found it way too painful if the comments need to be long...

    # Never enable this config, because if you do the space-time
    # continuum will collapse into itself and the cloud servers
    # will disappear in a puff of steam. However, if you really
    # must enable it, remember that it's boolean and go read
    # TICKET-8675309 for the extensive list of side effects.
    TurboFactorRenoberation = false

...so people just end up writing stuff like:

    {
        "ConfigKeyComment": "TICKET-8675309",
        "ConfigKey": "TurboFactorRenoberation",
        "ConfigVal": false
    }

[edit]: formatting

mschuster91 · on June 20, 2018

That's not JSON anymore, that's pretty much enough to be considered a unique DSL. Scary that this is more or less required.

zaphar · on June 20, 2018

Almost all applications evolve their config files into unique DSLs over time. They may choose a generic serialization for the DSL's AST but it will end up being an underspecified application specific DSL regardless.

lostctown · on June 20, 2018

Interesting approach.

Maybe I'm lazy but avoiding increasing cost to commenting is one of the few absolutes I abide by. Often I find myself tired after a long stretch of code, trying to convince myself that's it's understandable on it's own.

This is one of those systematic rules I have to enforce to shutdown my lazy lizard brain.

edit: But I can see how highly structured comments could actually come in handy as well for viewing configs in a gui

Zamicol · on June 20, 2018

>Suppose you are using JSON to keep configuration files, which you would like to annotate. Go ahead and insert all the comments you like. Then pipe it through JSMin before handing it to your JSON parser.

-Douglas Crockford, creator of JSON

There is no issue using JSON with comments for a config file.

artpepper2 · on June 20, 2018

Blurg, now every tooling pipeline that uses JSON needs to include a JSMin step...

Zamicol · on June 20, 2018

If you're making comments for yourself in config files you control:

/\/\/.*/gm

https://regexr.com/3rbgb

lolc · on June 20, 2018

"hello // syntax error"

Zamicol · on June 21, 2018

You are 100% correct. For not single line comments, "//" in the datastructure, and everything else there are already well tested solutions for the problem. https://github.com/sindresorhus/strip-json-comments

My point is only that this isn't a big issue. I don't understand why so many see it as a large issue. Instead, projects use non-standard YAML or other problematic solutions only because "JSON doesn't have comments".

ngrilly · on June 20, 2018

Yes, except for the fact that ECMAScript has a syntax for comments, but not JSON. Standards matter.

always_good · on June 20, 2018

Or pipe through json5 which has other conveniences one might want like trailing commas.

Zamicol · on June 20, 2018

I love json5, but it's not always an option.

always_good · on June 20, 2018

Yes, but ITT we're talking about someone apparently having the option to pick which format they're using for config, and they'd use JSON if it wasn't for one dealbreaker.

gaius · on June 20, 2018

JSON lacks comments (both line-level and block level)

    “_comment”: “blah blah blah”,

Simples

lomnakkus · on June 20, 2018

That doesn't work if you use a strict parser where superfluous fields are an error. It's quite rare, but there are good use cases for that kind of strictness.

cesarb · on June 20, 2018

Or when no fields are superfluous. For instance, when the code iterates over all the fields and does something with each, instead of just looking for known keys.

inimino · on June 20, 2018

Or when the place you need to put the comment doesn't happen to be inside an object literal.

arp242 · on June 20, 2018

This is ugly.

dagw · on June 20, 2018

In contrast, JSON is super intuitive and basically self documenting.

Personally I've found the exact opposite when dealing with 'normal' people. Most people can get basic YAML, but unless they're a programmers (or at least know how to program) most people fail miserably at writing JSON by hand.

twodave · on June 20, 2018

I agree with this. Our biz guys have to edit JSON config files regularly, and they're limited to basically just copying/pasting lines from existing files and editing the values. When they need to be able to do more, we end up building a UI for it and either storing the config elsewhere or writing the code to manage persisting their changes to the config.

emodendroket · on June 20, 2018

See how much fun these people have trying to figure out why their application crashes because they put a tab instead of spaces into the YAML file.

arbie · on June 21, 2018

We use a YAML linter before attempting to load the file. Provides detailed feedback to the user: "Tab found at line 17, column 23"

Also detects repeated hash keys. This is a very good compromise between user-friendliness and machine-friendly specification and serialization.

Thank you very much for YAML! It is a critical user-interceptable interchange mechanism at several companies that I have worked at.

XorNot · on June 20, 2018

JSON has too many rules regarding comma placement.

sho · on June 20, 2018

> objects can't have a trailing comma

This has caused me so much misery in the past, especially since none of the tools will tell you which line the offending comma is on. Great, somewhere in my thousand-plus-line JSON file is a tiny syntax error but you won't tell me where.

Ended up having to regex for them. Didn't do wonders for my trust in JS tooling.

joobus · on June 20, 2018

From the command line:

    python -m json.tool < somefile.json

This will tell you where your file is messed up.

There is also https://github.com/zaach/jsonlint

sho · on June 21, 2018

Thanks for the tip :)

xellisx · on June 20, 2018

https://json5.org/

dorfsmay · on June 20, 2018

You use JavaScript a lot more than python, right?

Intuitive usually means "close to what I'm used to".

Alex3917 · on June 20, 2018

I actually use Python a lot more than Javascript. That's why the double quotes and the no trailing comma are always what get me.

Especially since leaving the trailing comma is considered the best practice in every other language.

weberc2 · on June 20, 2018

JS added support for a trailing comma, and single-vs-double quote is a contentious topic in Python land!

pphysch · on June 20, 2018

Python dicts are a lot closer to JSON than YAML.

The real schism here, IMO, is Programmer Intuitive vs. Natural Language Intuitive.

geoalchimista · on June 20, 2018

Second this. I have used Python dict to write the default config file and JSON Schema to validate a user-supplied config file, which worked quite well in regard of that purpose.

dokem · on June 20, 2018

Python objects and JSON are practically identical.

weberc2 · on June 20, 2018

In what sense could this be true? Python objects support a whole host of behavior; JSON is a data format. Python dicts might be a closer analogy except Python keys can be anything that is hashable while JSON requires strings, and of course Python dict values can be any Python value; not just the JSON analogs.

dokem · on June 20, 2018

I think you are purposely mis-interpreting me. Python dicts are practically, syntactically identical to JSON. Yes, python dict values can be any python value the same way JSON in JS can be any JS value. Point being, someone coming from python would see JSON as identical to a python dict. We can run around in semantic circles all day.

weberc2 · on June 20, 2018

> I think you are purposely mis-interpreting me.

You misrepresented yourself by saying "object" when you meant "dict" and saying "practically identical" when you meant "vaguely syntactically similar". I wasn't trying to nitpick your semantics; I just had no idea that "Python objects are practically identical to JSON" meant "Python dicts are to Python what JSON objects are to JS, oh and also Python dicts have some syntactic similarlities to JSON" or whatever.

dokem · on June 20, 2018

You sure are difficult. There is a non-trivial set of text that is both valid JSON and a valid python dict. Many people would consider the two very similar.