One thing to remember is that YAML is about 20 years old. It was created when XML was at peak popularity. JSON didn't exist (YAML is a parallel, contemporary effort). Even articulating the problems with XML's approach was an uphill battle. What you would replace it with is also hard. What use cases matter? What is the core model? A simple hierarchy? Typed nodes? A graph? What sort of syntax is needed for it to be usable? These were all questions. Seen in context, we got quite a bit correct. And yes... it has a few embarrassing warts and a few deep problems. Ah well.
A second thing to consider... YAML was created before it was common that tech companies actively contributed to open source development. There are lots of things we could have done differently if we had more than a few hours per week... even a tiny bit of financial support would have helped.
Finally, YAML isn't just a spec, it has multiple implementations. Getting consensus among the excellent contributors is a team effort, and particularly challenging when no one is getting paid for the work. Once you have a few implementations and dependent applications, you're kinda stuck in time.
It was an special pleasure for me to have had the opportunity to work with such amazing collaborators.
We did it gratis. We are so glad that so many have found it useful.
I am the author of this article. Apparently people read my website (how they get there, I don't know?)
At any rate, it's worth mentioning that in the conclusion I wrote:
> Don’t get me wrong, it’s not like YAML is absolutely terrible but it’s not exactly great either.
I still use YAML myself even when I have the freedom to use something else simply because – for better or worse – it's very widespread, and for many tasks it's "good enough". For other tasks, I prefer to avoid it.
I think that a stricter version of YAML (such as StrictYAML) would make a lot of people's lives easier though.
I think you're probably right there. I use YAML when something else I'm using calls for it, but mainly I tend to output things in it just because it's very readable.
Using it a lot more lately as I'm diving into Ansible, so I'll be interested to see if I run into problems.
The trend of "stick together Yaml and a template engine, we have our DSL!" in CM sytems is a bit horrible.
Ansible does make some efforts to limit jinja templating to variable substitution, but it's sill not that great, you have all kinds of weird stuff that can happen specially with colons.
The worst one is saltstack, the resulting syntax is just atrocious and border line unreadable, I not a big fan of map.jinja files[0] and on the yaml side, things can get ugly quite fast [1].
I know it's not a popular opinion, but I would rather use the puppet DSL, even with its step learning curve.
The whole concept of templating language on top of YAML is suspect anyway, but I wish Salt had just gone with Mako as the default templating language. That way you could write plain Python in your templates and not have this horrible misuse of Jinja.
I also have come to agree that a DSL is the best solution, though Puppet's particular DSL is not a great example. Projects that re-implement the same thing from scratch like mgmt[1] are on the right track, but probably won't gain enough traction.
Thanks! Last time I checked my domain got penalized for having abnormal low markup or some such, which apparently makes it look like a spam site. I am proud of this.
> Last time I checked my domain got penalized for having abnormal low markup or some such
Do you have a link to the document you were pointed to when you got penalized? If it was Google who penalized you, they must have pointed you to a URL with documentation on why you got penalized and how to resolve it.
I ask this because I run a few websites with even lesser markup than your site but I have never got penalized. I once got penalized due to excessive number of spam comments on one of my websites and they pointed me to https://support.google.com/websearch/answer/190597 ("Remove this message from your site") to resolve the issue. This issue did not affect the search ranking much though (dropped by only about 2 or 3 places in the list of results). But never had an issue with abnormally low markup.
The markup in your website looks pretty reasonable to me, so I am surprised you could get penalized for that when I have had no issues with even lesser markup and they still appear at the top of the list of results for relevant search terms.
I think it was some tool at moz.com, but I don't recall from the top of my head. I don't think it was Google itself. I have no idea what effect that has; I'm not really in to that world.
> I have had no issues with even lesser markup and they still appear at the top of the list of results for relevant search terms.
It seems people are finding my site, whether or not it's being penalized. I mean, someone other than me posted it here, right?
Drupal 8 uses YAML* as its configuration language because JSON doesn't support comments. That simple. Thank you for YAML, it does deliver for us: it's human readable and it's easy to parse (see below).
* I mean, it uses an ill defined subset of YAML. The definition is "whatever the Symfony YAML parser supports".
You know what else is human readable, easy to parse if you're using PHP, and supports comments?
PHP.
I understand why some languages rely on common configuration file formats.
I don't understand why the popular dynamic script-y languages don't more commonly use the natively-expressable associative/list data structures that they're famous for making convenient.
Using includes/imports is not the greatest idea ever.
Your configuration file is one of your program interface. It's something that must be well define. If your configuration file is a programing language this interface is not that well defined.
Also you expose yourself to all kind of weird bugs because some (too smart for their own good) people will monkey patch your software using it.
It adds a lot of unnecessary stuff in the configuration file, things like ';' or '$' are not really useful.
Lastly, common configuration file format are good because there are... common. You can have 2 pieces of software in 2 different languages accessing the same configuration file. A common example of that is configuration management, There are a lot of modules/formula in salt/ansible/puppet/chef doing fine parsing of the configuration files and permits fine grain settings, and I'm not mentioning augeas. If your configuration is a php/python/perl/ruby file good luck with that.
I know it's really common for php applications to do configuration files in php, but frankly, it's a bit annoying.
> If your configuration file is a programing language this interface is not that well defined.
While I do agree with the rest of your comment I don't think they were advocating using the full language for configuration, just the maps/arrays/etc. (e.g. Python's `literal_eval`).
If a key objection/perceived threat is that this might give someone an insertion point they're not meant to use for code ... well, let's consider that we're talking about applications distributed as interpreted language source here. Disallowing code-as-config isn't even closing the door of this particular barn after the horse has left, it's putting two strands of police line tape across the bottom half of the gap where the door was never installed and hoping any equines thinking of passage politely consider the message in case it hadn't already occurred to them which side of the entrance they preferred to be on.
Consider this: Design and optimize for the common case.
Why do we have config files? Because developers actually want a place dedicated to simple or structured application configuration data, for which PHP assignments with arrays + primitives can function at least as effectively as JSON. Most developers would prefer that config data get loaded quickly so the application can get on to doing actual app-y things. Using the language for this means you're parsing at least as fast as you can interpret and you can also take advantage of any code caching that's part of your deployment (especially nice in the PHP-likely event that config settings would be reloaded with every request).
Abuse isn't likely to be the common case. The end users you invoked certainly aren't going to be the ones looking for opportunities to insert code over data. Developers have other places to put code and, as mentioned, probably actually want a place dedicated to data. You're still right that of course someone will do it, just like someone will inevitably create astronaut architecture hierarchy monstrosities in any language with classical inheritance or make potentially hidden/scary changes to language function using metaprogramming facilities.
But potential for abuse doesn't automatically mean a feature should be disallowed.
A lot of the time it's better to let people who can be circumspect have the benefits of a potential approach, and if somebody thinks they need to solve a problem by using a technique that's arguably abuse, well, let them either find out why it's a bad idea or enjoy having solved their problem in an unusual way. Not the end of the world. Possibly even legit.
You can use arbitrary tools to programmatically generate YAML (or JSON, or XML, any of the other "data only" formats.) This allows for tools to drive other tools by generating a spec file and feeding it in. See e.g. Kubernetes for a good example of that.
There's no language that I'm aware of that can natively generate PHP syntax, and there's no common multi-language-platform library for generating PHP syntax. I think that's most of the reason.
To contradict myself, though: Ruby encodes Gemfiles and Rakefiles as Ruby syntax. And Elixir encodes Mixfiles, Mix.Config files, Distillery release-config files, and a bunch of other common data formats as Elixir syntax.
And, of course, pretty much every Lisp just serializes the sexpr representation of the live config for its config format (which means that, frequently, a lot of Lisp runs code at VM-bootstrap time, because people write Turing-complete config files.)
> There's no language that I'm aware of that can natively generate PHP syntax
This is a solid argument against using PHP (or any such language) as a cross-language data interchange format. There are others :) And I totally agree you want a language independent format for anything you might have to feed across an ecosystem of tools.
For a PHP-system generating/altering its own config files... PHP's `var_export` generates a PHP-parseable string representation of a variable (though it sadly doesn't use the short array syntax).
Turing-complete config files probably have some hazards, like Lisp itself does. YMMV regarding whether those hazards can be avoided by circumspect developers or need to be fenced off.
This, and the security problems of executable code as configuration, are why the OpenBSD people mandate that /etc/rc.conf is not general-purpose shell script, and why the systemd people mandate that /etc/os-release is similarly not. People want to be able to parse configuration files like this with something other than fully-fledged shell language interpreters; and they want these things to not be vectors for command injections.
Settings.py is uniquely bad, though, IMO because it tries to be a badly defined dict(), instead of exposing proper configuration interfaces. Ruby config files are common and usually fairly great, see for example the Vagrantfiles.
And you won't have to generate your config files (parsing, maaaaaybe), because those needs are covered by the fact that the files are programs. They are _already_ generating a configuration.
> And you won't have to generate your config files (parsing, maaaaaybe), because those needs are covered by the fact that the files are programs. They are _already_ generating a configuration.
Yes, theoretically, if settings.py was a "generator" format that you ran as a pre-step (like you do to get parser-generators like Bison to spit out source files for you to work with), and this generator actually spat out something like a settings.json, and all the rest of the infrastructure actually dealt with the settings.json rather than the generator, then, yes, it wouldn't matter. Tools in other languages could just generate the settings.json directly.
As it stands, none of those things are true, so tools in other languages actually need to do something that outputs settings.py files.
Galaxy brain: if your config is programmable, it can read whatever terrible configuration format you want. That means my settings.py (yes, I'm forced to use Django) is configured via environment, which is populated by k8s from - gasp - JSON files.
That means that if I wanted to configure Vagrant with JSON, there is no force in the universe that could stop me.
If the config file is actually a normal program, then it can do normal program things, then any benefit from using JSON instead is nullified by the fact that you can still use JSON. In turn, if your tools primary configuration is via a more limited settings, you're stuck with it. Not even "generators in other languages" allow comparable runtime flexibility.
Yup, totally agree with you, settings.py has always been a pain in the ass. Not really an acute one but the kind that is uncomfortable but not enough to make you do something about it.
> There's no language that I'm aware of that can natively generate PHP syntax.
Actually, I've had to use PHP to output a PHP configuration array for a project that required config in PHP.
`var_export($foo)` will output valid PHP code for creating the array $foo. In my case I was doing horrible things to create the array in my pseudo-makefile, then using `var_export()` to output the result. Note that you can run php from the Bash CLI with the `-r` flag, which helps.
Tcl works well for configuration files. You can strip away the extraneous commands in a sub-interpreter to prevent Turing completeness and add infix assignment to remove the monotony of the set command and what you get is a nice config format. If you need more power in the future you just relax some of the restrictions and use it as a script without breaking existing files.
People get really upset when they have to type "array(" instead of "[" or "{" (pre-PHP 5.something) and quotes instead of no quotes (and punting the character escape problem to something else) I guess.
Using code-as-data works really well in Lisp-like languages. Reading a Clojure project's project.clj file or a Lisp project's project.asdf file is pretty pleasant. A programming language's choice in how it decides to handle library config info for building and specifying dependencies (XML, makefiles, JSON, YAML, INI, nothing, etc...) will be a good indicator for the culture of the language around config files in general. Composer for PHP only came out in 2012.
Interestingly, the Lua programming language actually evolved from configuration files: https://www.lua.org/history.html (and is still officially deemed useful for writing them)
I use Lua for configuration files for both personal and work related projects [1]. You get comments and the ability to construct strings piecemeal (DRY and all that). It's easy to sandbox the environment, and while you can't protect against everything (basically, a configuration script can go into an infinite loop), if someone unauthorized does have access to the script, you have bigger things to worry about.
That was also one of the rationales behind TCL's design.
John Ousterhout explained in one of his early TCL papers that, as a "Tool Command Language" like the shell but unlike Lisp, arguments were treated as quoted literals by default (presuming that to be the common case), so you don't have to put quotes around most strings, and you have to use punctuation like ${}[] to evaluate expressions.
TCL's syntax is optimized for calling functions with literal parameters to create and configure objects, like a declarative configuration file. And it's often used that way with Tk to create and configure a bunch of user interface widgets.
Oliver Steel has written some interesting stuff about "Instance-First Development" and how it applies to the XML/JavaScript based OpenLaszlo programming language, and other prototype based languages.
>The equivalence between the two programs above supports a development strategy I call instance-first development. In instance-first development, one implements functionality for a single instance, and then refactors the instance into a class that supports multiple instances.
>[...] In defining the semantics of LZX class definitions, I found the following principle useful:
>Instance substitution principal: An instance of a class can be replaced by the definition of the instance, without changing the program semantics.
In OpenLaszlo, you can create trees of nested instances with XML tags, and when you define a class, its name becomes an XML tag you can use to create instances of that class.
That lets you create your own domain specific declarative XML languages for creating and configuring objects (using constraint expressions and XML data binding, which makes it very powerful).
The syntax for creating a bunch of objects is parallel to the syntax of declaring a class that creates the same objects.
So you can start by just creating a bunch of stuff in "instance space", then later on as you see the need, easily and incrementally convert only the parts of it you want to reuse and abstract into classes.
>I don't understand why the popular dynamic script-y languages don't more commonly use the natively-expressable associative/list data structures that they're famous for making convenient.
You picked the wrong language... PHP comes with its own JSON parser. And INI and XML and even CSV.
But, the reason is that, generally, you want config files to describe data or state only. Yes, you could just make your config native code, but then the temptation to add functions and methods and logic to that becomes irresistible and soon your config is an application that needs its own config.
Config formats need to be simple, and preferably not Turing complete.
Because it's just in general incredibly short sighted to think that your config file is never going to be read by code written in another language.
There's also an argument about whether making configuration files able to execute arbitrary code is a good idea. You get straight into the JavaScript 'eval' problems which we've spent a decade escaping.
I think some of it is PLOP (Principle of Least Power).
$CFG = random() > 0.5 ? "yes" : "no";
...is likely "too powerful". It'd be nice if there were ways in certain programming languages to do something like "drop privileges" to avoid loops, function calls, external access, etc.
The makers of Drush, the cli for Drupal, subscribed to your line of thinking in the early versions and inventory items were defined in PHP files. Migrating from that will be interesting.
Because that forces the end user, who might not know anything about the programming language one’s application is written in to wrestle with the low level implementation details. In the words of Keith Wesolowski, the programmer assumes that the end user is a “Linux superengineer”, which is almost always a wrong assumption to make.
I totally agree that in the ideal world, JSON should support comments. I yearn for them, and none of the in-band work-arounds or post-processing tools are acceptable substitutes.
But to play the devil's advocate, how would JSON be able to support round-tripping comments like XML can, since <!-- comments --> are part of the DOM model that you can read and write, while JSON // and /* comments */ are invisible to JavaScript programs. There's nowhere to store the comments in the JSON object model, which you would need to be able to write them back out later!
On important feature of JSON is being able to read and write JSON files with full fidelity and not lose any information like comments. XML can do that, but JSON can't. To fix that you'd have to go back and redesign (and vastly complicate) fundamental JavaScript objects and arrays and values, to be as complex and byzantine as the DOM API.
The less-than-ideal situation we're in isn't JSON's fault or JavaScript's fault, because JSON is just a post-hoc formalization of something that was designed for a different purpose. But JSON is rightly more popular than XML, because it's extremely simple, and nicely impedance matched with many popular languages.
YAML suffers from the same problem as JSON that it can't round-trip comments like XML can, but it fails to be as simple as JSON, is almost as complex as XML, and doesn't even map directly to many popular languages (as the article points out, you can't use a list as a dict key in Python, PHP, JavaScript, or Go, etc).
You can sidestep some of JSON's problems by representing JSON as outlines and tables in spreadsheets, without any need for syntax and sigils like brackets, braces, commas, no commas, quoting, escaping, tabs, spaces, etc, but in a way that supports rich formatted comments and content (you can even paste pictures and live charts into most spreadsheets if you like), and even dynamic transformations with spreadsheet expressions and JavaScript.
In fact YAML is probably more complex than XML; the specification of YAML, when I print it into PDF, is about three times as long as that of XML 1.0. (And XML 1.0 also describes DTD, which is kind of a simple type validation for XML and thus includes much more than just serialization syntax.)
> But to play the devil's advocate, how would JSON be able to support round-tripping comments like XML can, since <!-- comments --> are part of the DOM model that you can read and write, while JSON // and /* comments */ are invisible to JavaScript programs.
It doesn't support it for whitespace in general (if you deserialize into JS object model or equivalent), so why would it be any different for comments specifically? It's just not a design goal of the format.
Although, of course, it's quite possible to have a JSON parser that preserves representation. It'll just have a non-obvious mapping to the host language because of all the comment and whitespace nodes etc.
I'm kinda sad that JSON has been struggling for like 15 years to get comments. Is there like some kind of gestapo that's saying no or something? All it takes is for the maintainers of probably 15 popular libraries to start handling comments.
At the end of the day I'm sure the reason we don't have JSON comments is somewhere listed in this page: xkcd.com/927/
I'm aware of at least three JSON libraries that at least can accept comments (Gson in lenient mode, Json.NET, and json-cpp are the ones I've used personally that do)-- it's hard to convince everyone that JSON needs comments, though, and comments are of limited utility if it's not guaranteed that they'll parse everywhere.
But you really only need comments in JSON if you're doing stuff like storing configuration in JSON, and JSON's too fiddly in general to be a great config file format (too easy to do something like forget a comma; no support for types beyond object, array, (floating point) number, and string). Something more like YAML without the wonky type inference would be better, IMO.
I believe Douglas Crockforf used to make the argument that JSON is not meant for human consumption and thus shouldn't be changed to better serve humans. I personally wish hjson (https://hjson.org) were to get more traction. I prefer it over both JSON and YAML.
> I removed comments from JSON because I saw people were using them to hold parsing directives, a practice which would have destroyed interoperability. I know that the lack of comments makes some people sad, but it shouldn't.
Well, then why not to allow a trailing comma in lists and objects? Computers don't care and they would even be happier, because they can then just pour array and object members with a trailing comma without concerning themselves whether this is the last member or not. (Dijkstra's train toilet problem comes to mind.) Also compare with XML, where each element is self-contained.
And why to model JSON syntax closely after JavaScript literal object syntax (which is actually more convenient, by the way) which, being taken from mainstream programming languages, naturally evolved to be written by humans in small amounts not by computers in large dumps? :)
VS Code uses JSON with comments for config files. [1]
Technically, this is not JSON. You won't be able to use a standard JSON parser without stripping comments first. But you can use a simple, JSON-like language with comments for config.
> simple
YAML is much, much more complicated than JSON.
Quoting a single word from the parent’s sentence is misleading. The sentence "YAML can be employed as a simple JSON-like language with comments." is true because JSON is YAML, so you can parse a JSON file with #-comments using a YAML parser.
Most YAML users don't need to look at the source for a YAML parser. I appreciate elegant simplicity, but I don't think parser complexity is the most important metric by which to judge a data interchange format.
If you use a YAML parser to parse JSON-with-comments, it will accept many inputs that don't correspond to JSON-with-comments, and furthermore is likely to report syntax errors that don't make sense to a user who only knows JSON.
So, this unnecessary parser complexity is a usability issue. You should use a parser for the config language you actually intend to support.
Jsonnet is awesome. We use it to generate our yaml files for kubernetes. YAML isn’t easy to parse, nor is it very flexible as a templating language. It gets cumbersome very quickly.
Jsonnet is a relief. Kubernetes should have been a dumb json config from the get go. JSON is ridiculously simple to parse and emit. It has huge interoperability as well with lots of programming languages.
I don't like it because it uses the = symbol which seems imperative rather than declarative. (Same with HCL, it might be a nitpick but these are languages I'm going to be using all the time.)
HOCON is interesting but at first glance it seems it might be too ambiguous for my tastes, because like YAML, because it supports both js-style ("//") and shell-style ("#") comments.
JSON plus comments is beautiful because it adds minimally to an unambiguous language which lends itself to automatic formatting (stringificiation).
I'd argue that = only feels imperative if you're used to imperative languages. Prolog and Haskell, both of which focus on being declarative, also both use the equals sign.
The trouble there is that your comments come in-band. What if you're trying to serialise something and you don't have the power to insist that it's not a dictionary with "comment" as a key?
Sigh. All I wanted to do is to say thanks for the YAML standard -- comments are important but not the only problem with JSON. And truly I can't be expected to remember all of this discussion from like six plus years ago. One thing I remember though, it the trailing comma problem -- we upstreamed a grammar change to Doctrine annotation so "foo, bar," is OK because PHP arrays accept that and it's bonkers trying to code a mostly PHP system without trailing comma support. Also, JSON is no fun to write , you need to have [] {} all correct where YAML is much easier. The less sigils the better and most of Drupal YAMLs only use the dash, the colon and the quote. This is the grave mistake Doctrine committed as well, instead of simple arithmetic (>=1.0) they used mysterious sigils in version specification (~1.0). Drupal is in the business of constantly accepting new contributors and (~R∊R∘.×R)/R←1↓ιR is not newbie friendly, no matter how you slice and dice it. There are certainly advantages of sigil heavy languages like APL and Perl but the scare factor is too high.
That's just ugly and you're mixing your comments with the data structure, which is potentially confusing. Also, Jason requires a lot more typing. I don't want to have to manually add in all the brackets, quotes and commas when editing config a file.
> JSON didn't exist (YAML is a parallel, contemporary effort).
Interesting. How did it happen then that, quoting the YAML 1.2 spec, that "every JSON file is also a valid YAML file"? Although the previous spec documents don't mention JSON.
Was that an intentional design decision for 1.2 or was it some kind of convergent design due to Javascript?
I have admired Douglas Crawford's excellent JSON from the moment I saw it, it is a model of simplicity. I also like TOML and wish it all the best. By contrast, YAML is complex and could use a hair cut.
When I say "JSON didn't exist", what I mean is that it wasn't popular or known to us when we were working on YAML. So, please excuse my sloppy wording. For me, the work on what would become YAML started with a few of us in 1999 (from SML-DEV list). In January of 2001 we picked the name and had early releases. It took a few years of iteration before we had a specification the collaborators (Perl, Python, and Ruby) could all bless.
Anyway, with regard to Crawford's excellent work, JSON. It is a coincidence that YAML's in-line format happened to align. Although, it's probably because of a "C" ancestor, not JavaScript. The main influence on the YAML syntax was RFC0822 (e-mail), only that from my perspective, it needed to be a typed graph. In fact, we documented where we stole ideas from, to the best we could recall at that time: http://yaml.org/spec/1.0/#id2488920.
That was my attempt at giving YAML a haircut. I'd be curious to know what you thought.
Thank you for creating YAML, by the way. Even though part of that rant was quoted from me, I'm not negative on it like the author - I think the core was brilliantly designed. If you put two hierarchical documents side by side - one in TOML and another in YAML the YAML one is much, much clearer and cleaner.
Thank you for StrictYAML I might just use it. It does look like a nice hair cut. You might wish to give Ingy a ring. He has been itching to move forward on a reduced/secure YAML subset.
That said, StrictYAML seems to be a tad bit more of a hair cut than I'd imagine. I'd keep nodes/anchors, since I think a graph storage model is underrated; I think that data processing techniques just haven't caught up with graph structures.
Further, I'm not sure everything can be easily typed based upon a schema. Hence, I'm not sure about completely dropping implicit types, perhaps you may want to provide a way for applications to resolve them if they wish. For example, an application may want to attempt to treat anything starting with "[" or "{" as JSON sub-tree. Perhaps keeping "!tag" but handing it off to the application to resolve might also be a good idea in this regard. Even so, typing should be done at the application level and default to something very boring.
> I'd keep nodes/anchors, since I think a graph model is underrated
Well, you can create graph models without it (and I do) - you can just use string identifiers to identify nodes and let the application decide what that means.
I always thought the intent behind nodes/anchors was not so much graph models but rather to take repetitive YAML and make it DRY. That appears to be how it is used, e.g. in gitlab's ci YAML.
>I'm not sure about completely dropping implicit types, perhaps you may want to provide a way for applications to resolve them if they wish. For example, an application may want to attempt to treat anything starting with [ or { as JSON.
I think that would cause surprise type conversions. There will be plenty of times when you want something to start with a [ or { and you won't want it parsed as JSON.
I embed snippets of JSON in YAML multiline strings sometimes and I usually just parse it directly as a string. Then I run that string through a JSON parser elsewhere in the code.
> I think that would cause surprise type conversions.
YAML has traditionally been used as the basis of higher-level configuration files for particular applications. What I'm saying is that implicit typing should be permitted, but delegated to those applications.
Conversely, I'm not saying that StrictYAML should do anything by default with unquoted values, except reporting them to the application as being an unquoted value. This way the application could choose to process the value differently from those that are quoted.
An interesting idea, but it's not clear that this will be less confusing or that application authors will make better at avoiding config languages gotchas than config language designers such as yourself (and existing app specific config languages suggest otherwise).
I think a reason this won't necessarily fix the problem with unmet expectations is that identical constructs in different but analogous yaml files would be likely to end up with very different semantics and users effectively have to remember which particular idiosyncratic YAML dialect choices various apps make. Say
version: 1.3
means the string "1.3" in app a), the float 1.3 in app b) and a version number in app c) one. Furthermore let's assume that app c) required a version number, whereas a) and b) required strings.
Another, more subtle problem, is that such a scheme would make it more likely that applications would end up parsing raw string representations themselves (with ensuing subtle differences even for things which are nominally meant to be identical, say dates or numbers and possibly security problems as well).
> I always thought the intent behind nodes/anchors was not so much graph models but rather to take repetitive YAML and make it DRY. That appears to be how it is used, e.g. in gitlab's ci YAML.
That's how I use it too. When I read about competing formats, that's the first feature I check for. It's really key for readability and usability in some use cases.
I don't have much to suggest. For YAML, the use of whitespace, colons and dashes primarily emerged from usability testing with domain experts who are not programmers. In particular, testing was done in the context of an application that needed a configuration and data auditing interface, an accounting application. Even anchors/aliases worked in this context and supported the application's use by making the audit records less repetitive without introducing artificial handles.
Other use cases such as dumping any in-memory data structure from memory, perhaps out of a sense that we needed full completeness, actually didn't have any end-user usability testing. Round-tripping data seems in retrospect to be a diversion from the primary value that YAML provided.
If you are writing a new YAML implementation, then yeah, you want a simpler spec to follow.
If on the other hand you are using a YAML library... I've had pretty good success using YAML compatibly across Python, Ruby, C# and Go projects. Do you have a particular issue in mind that the existing Ruby implementation doesn't address?
YAML is an invented serialization format, JSON is a discovered one. As CrOCKford points out, JSON existed as long as JS existed, he just called it out and put a name on it.
Anyway, XML is a strong anti-pattern (too much security, even if you get it right on your end, the other party likely screwed something up). YAML seems to be going down that path too.
TOML seems to be "the JSON of *.ini" (ie: discovering old conventions, rather than inventing new ones), and I'm glad to have been exposed to it.
If you define JSON as the underlying practice that Crawford later named and documented, then sure, what I wrote reads completely wrong headed. However, when we were working on YAML, JSON was not yet called out and given a name.
I believe the most important convention that YAML and JSON shared was a recognition of the typed map/list/scalar model used by modern languages. Further, as far as conventions go, I think there's quite a bit to be said about languages that use light-weight structural markers such as: indentation, colon and dash.
It's not really a moral judgement, thanks for your contributions and your innovations, but I prefer not to use YAML if possible for the same reasons the author outlined.
"JSON" became popular in the 90s.
They were http requests which returned javascript which you would simply eval(). No need to write or import a parser, and it is the same syntax as the language you're using, because it is the same language. In technology many things become popular not because how good (or bad) things are, but how easy to use something is.
Clark, thanks so much for YAML. I love it and use it a lot. It actually increases the day-to-day joy of the work I do as a developer.
(While constructive criticism is fine, those rare people who trash it are... nonsensical to me. I'd like to see them do one-tenth as good under the same conditions!)
> I discovered JSON. I do not claim to have invented JSON because it already existed in nature. What I did was I found it, I named it, I described how it was useful. I don’t claim to be the first person to have discovered it. I know that there are other people who discovered it, at least, a year before I did. The earliest occurrence I found was there was someone at Netscape who was using JavaScript array literals for doing data communication as early as 1996, which was at least 5 years before I stumbled onto the idea.
I can independently confirm that people were using JSON before he named it JSON. I was dumping data in JSON in 2000 for dynamically displayed reports.
But then again I was already used to using Perl data structures as dumped by Data::Dumper for config, because I was taught a lot about Perl by a Lisp programmer who had used Lisp data structures for the same purpose since the 1980s. So using JSON didn't feel original or clever. It seemed like I was simply using a well-known technique in yet another dynamic language.
Then again our reaction to XML was the stupid thing other people were doing that you had to do to interact with the rest of the world. I got used to holding my tongue until I went to Google a decade later and found that my attitude was common wisdom there...
According to Platonism, JSON has no spatiotemporal or causal properties (like a datetime format) and thus has existed and will exist eternally. All hail JSON.
"Anyone who uses YAML long enough will eventually get burned when attempting to abbreviate Norway."
Example:
NI: Nicaragua
NL: Netherlands
NO: Norway # boom!
`NO` is parsed as a boolean type, which with the YAML 1.1 spec, there are 22 options to write "true" or "false."[1] For that example, you have wrap "NO" in quotes to get the expected result.
This, along with many of the design decisions in YAML strike me as a simple vs. easy[2] tradeoff, where the authors opted for "easy," at the expense of simplicity. I (and I assume others) mostly use YAML for configuration. I need my config files to be dead simple, explicit, and predictable. Easy can take a back seat.
The implicit typing rules (ie, unquoted values) should have been application dependent. We debated this when we got started and I thought there was no "right" answer. Alas, Ingy was correct and I was wrong.
I appreciate your humility and professionalism in a discussion thread that holds a lot of criticism; suffice it to say, I should have practiced a bit more humility and a bit less "Monday morning quarterbacking" in my original post. And I should have read your comment on YAML's history. To right the record: you got _so_ much right with YAML, and it's unfair for me to cherry-pick this example 20 years later. Sincere apologies...
As the saying goes, "there are only two kinds of languages: the ones people complain about and the ones nobody uses." YAML, like any language, isn't perfect, but it's withheld the test of time and is used by software around the world—many have found it incredibly useful. Sincere thanks for your contribution and work.
As someone who doesn't really use YAML much, your comment provides a good introduction to the kinds of things one needs to know before choosing formats in the future.
This is a very good example of the problems of YAML and it's one of those things that has really preplexed me about the design of YAML. (I suppose it's a sign of the times when YAML was designed.)
It's[1] just so blatantly unnecessary to support any file encoding other than UTF-8, supporting "extensible data types" which sometimes end up being attack vectors into a language runtime's serialization mechanism, autodetecting the types of values... the list goes on and on. Aside from the ergonomic issues of reading/writing YAML files, it's also absurdly complex to support all of YAML's features... which are used in <1% of YAML files.
A well-designed replacement for certain uses might be Dhall, but I'm not holding my breath for that to gain any widespread acceptance.
[1] Present tense. Things looked massively different at the time, so it's pretty unfair to second-guess the designers of YAML.
I've been bit by the string made out of digits and starts with 0 thing a couple times. In this case it gets interpreted as a number and drops leading zeroes. I quickly learned to quote all my strings.
I'd still love for a better means to resolve ambiguities like this, but I've found always quoting to be a fairly reliable approach.
I'm not an expert by any means, but I'm pretty sure that Ansible uses vanilla YAML (no 'bastardization').
Your first example is an Ansible convenience feature, it's not extending or changing the YAML syntax in any way. You can simply specify `cmd` values as lists or strings, since working with one or the other may be easier depending on the use case.
The templating is unfortunate in some areas, especially where the jinja2 syntax conflicts with what YAML expects (for example starting an object with '{'). That's due to a combination of templating engine choice and YAML, though, and not some custom implementation of YAML. Unless I'm misunderstanding?
I do think going with YAML was a trade-off for Ansible, but it's hard to see Ansible getting to where it is today if it had gone with a custom DSL (or JSON, thank god). I'd take Ansible's YAML over Chef's Ruby or CloudFormation's JSON any day.
Oh god, Ansible is exactly why I don't like YAML or Jinja2. I never know what needs to be quoted, what's inlined, what needs to be wrapped in "{{ }}", and what expressions are supported. But once you get the syntax right, it works great.
SaltStack also has the JINJA2 template embedding which can make it very difficult to understand which parts of the lifecycle run through templating. I'm still not certain I understand how it works.
The most recent offenders for bastardizing YAML I have seen are the different CI services:
* Circle CI using moustache-like templating and interpolation with things like {{ .Branch }} available in certain steps [1]
* GitLab CI adding an "include" type directive to declare YAML dependencies [2]
I've also experienced this professionally. At my last company, somebody decided to add a feature to enable interpolation in some parts of the YAML deployment data. It ended up being used by a handful of people who were confused why interpolation worked in some places and not others. The weird trend of "extending YAML" seems to be going against any sort of benefits you might have by trying to use it.
You can usually use plain old JSON anywhere where YAML would be used (e.g. host vars, group vars, vars file includes, I think even playbooks). And internally, most everything in Ansible is JSON anyways.
YAML is for convenience for hand-editing configuration/task files; if you're doing anything that doesn't require hand editing/readability, use JSON.
With YAML I can never remember what's an object versus a list, string, or number, nor am I ever able to add new stuff to a YAML file and get it to parse correctly without first looking up the spec. And it's impossible to see where large objects start and end.
In contrast, JSON is super intuitive and basically self documenting. The only real quirks are that you need to use double quotes, and objects can't have a trailing comma.
The only good thing I can see about YAML is that it's super easy to convert and re-export to JSON.
> In contrast, JSON is super intuitive and basically self documenting. The only real quirks are that you need to use double quotes, and objects can't have a trailing comma.
I'd expand the list of quirks... JSON lacks comments (both line-level and block level). Fine for data transport but super super bad for configuration files.
> lacks comments [..] super super bad for configuration files
Not that that matters when applications take it upon themselves to re-save the config file in some kind of normalisation effort. Bye-bye comments, hope they're checked in somewhere.
This hits home. Every time I've ever had to make the decision I've chosen yaml, for exactly this reason. Funny how the seemingly small things can be absolute show stoppers when it comes to making decisions in production.
The barrier here would be whether there's support in enough implementations to feel safe using it in the wild, which I'm guessing will take a while at the very least.
What’s incredible here is that we’re not at the beginning of programming, when we built temporary languages that ended up becoming forgotten. Web may be a final form of IT, Angular may be the « right », the final, the perfect way to build applications even in 50 years, just like HTML has become the final way to build websites for the last 25 years, JSON may make legacy, and my grandson might even struggle with parsers that still use JSON instead of this new tech called JSON5...
The popularity of transpilers might help overcome that barrier. IDE's and task runners can watch for file modifications and run a simple program to convert to the old format.
But I'm curious what you mean by "in the wild"? If you're using (producing) it, something needs to consume it, and you would probably have control over both in whatever project you were using it for.
Wow, I can "just" use that, thanks. The problem is that JSON is an interchange format, meaning that I need to implement this serialization and deserialization quirk on every producer/consumer of my API (which, you know, avoiding is kind of the point of using a standard interchange format). Furthermore, because everything is a string, I can't unambiguously indicate something is meant to be a string in that format rather than a date.
Year:
YYYY (eg 1997)
Year and month:
YYYY-MM (eg 1997-07)
Complete date:
YYYY-MM-DD (eg 1997-07-16)
Complete date plus hours and minutes:
YYYY-MM-DDThh:mmTZD (eg 1997-07-16T19:20+01:00)
Complete date plus hours, minutes and seconds:
YYYY-MM-DDThh:mm:ssTZD (eg 1997-07-16T19:20:30+01:00)
Complete date plus hours, minutes, seconds and a decimal fraction of a second
YYYY-MM-DDThh:mm:ss.sTZD (eg 1997-07-16T19:20:30.45+01:00)
where:
YYYY = four-digit year
MM = two-digit month (01=January, etc.)
DD = two-digit day of month (01 through 31)
hh = two digits of hour (00 through 23) (am/pm NOT allowed)
mm = two digits of minute (00 through 59)
ss = two digits of second (00 through 59)
s = one or more digits representing a decimal fraction of a second
TZD = time zone designator (Z or +hh:mm or -hh:mm)
More like browsers can de-serialize JSON5 natively. writing a JSON5 parser is not difficult. It's just not part of most std libs in most languages, but I would argue that most std libs don't parse YAML either.
{
"ConfigKeyComment": "This is for blah blah blah",
"ConfigKey": "Foo"
}
Obviously this wouldn't work in all cases (you're putting more work on your parser to interpret unused keys basically), but if we're talking config files specifically, I see this as an acceptable approach since there's little chance you'll be parsing such files more than once each (plus, writing a simple tool to strip the comments out would be very trivial).
I've tried something similar but found it way too painful if the comments need to be long...
# Never enable this config, because if you do the space-time
# continuum will collapse into itself and the cloud servers
# will disappear in a puff of steam. However, if you really
# must enable it, remember that it's boolean and go read
# TICKET-8675309 for the extensive list of side effects.
TurboFactorRenoberation = false
Almost all applications evolve their config files into unique DSLs over time. They may choose a generic serialization for the DSL's AST but it will end up being an underspecified application specific DSL regardless.
Maybe I'm lazy but avoiding increasing cost to commenting is one of the few absolutes I abide by. Often I find myself tired after a long stretch of code, trying to convince myself that's it's understandable on it's own.
This is one of those systematic rules I have to enforce to shutdown my lazy lizard brain.
edit: But I can see how highly structured comments could actually come in handy as well for viewing configs in a gui
>Suppose you are using JSON to keep configuration files, which you would like to annotate. Go ahead and insert all the comments you like. Then pipe it through JSMin before handing it to your JSON parser.
-Douglas Crockford, creator of JSON
There is no issue using JSON with comments for a config file.
You are 100% correct. For not single line comments, "//" in the datastructure, and everything else there are already well tested solutions for the problem. https://github.com/sindresorhus/strip-json-comments
My point is only that this isn't a big issue. I don't understand why so many see it as a large issue. Instead, projects use non-standard YAML or other problematic solutions only because "JSON doesn't have comments".
Yes, but ITT we're talking about someone apparently having the option to pick which format they're using for config, and they'd use JSON if it wasn't for one dealbreaker.
That doesn't work if you use a strict parser where superfluous fields are an error. It's quite rare, but there are good use cases for that kind of strictness.
Or when no fields are superfluous. For instance, when the code iterates over all the fields and does something with each, instead of just looking for known keys.
In contrast, JSON is super intuitive and basically self documenting.
Personally I've found the exact opposite when dealing with 'normal' people. Most people can get basic YAML, but unless they're a programmers (or at least know how to program) most people fail miserably at writing JSON by hand.
I agree with this. Our biz guys have to edit JSON config files regularly, and they're limited to basically just copying/pasting lines from existing files and editing the values. When they need to be able to do more, we end up building a UI for it and either storing the config elsewhere or writing the code to manage persisting their changes to the config.
This has caused me so much misery in the past, especially since none of the tools will tell you which line the offending comma is on. Great, somewhere in my thousand-plus-line JSON file is a tiny syntax error but you won't tell me where.
Ended up having to regex for them. Didn't do wonders for my trust in JS tooling.
Second this. I have used Python dict to write the default config file and JSON Schema to validate a user-supplied config file, which worked quite well in regard of that purpose.
In what sense could this be true? Python objects support a whole host of behavior; JSON is a data format. Python dicts might be a closer analogy except Python keys can be anything that is hashable while JSON requires strings, and of course Python dict values can be any Python value; not just the JSON analogs.
I think you are purposely mis-interpreting me. Python dicts are practically, syntactically identical to JSON. Yes, python dict values can be any python value the same way JSON in JS can be any JS value. Point being, someone coming from python would see JSON as identical to a python dict. We can run around in semantic circles all day.
You misrepresented yourself by saying "object" when you meant "dict" and saying "practically identical" when you meant "vaguely syntactically similar". I wasn't trying to nitpick your semantics; I just had no idea that "Python objects are practically identical to JSON" meant "Python dicts are to Python what JSON objects are to JS, oh and also Python dicts have some syntactic similarlities to JSON" or whatever.
You sure are difficult. There is a non-trivial set of text that is both valid JSON and a valid python dict. Many people would consider the two very similar.
Years ago I had to support a tool that used YAML as a configuration language, and a transport between different applications. Holy. Hell.
First of all, don't ever try to edit a YAML file by hand. You will introduce whitespace or other characters that will break the file, and you will not know until you run it and it breaks something.
The reason you will not know? Not all YAML parsers are the same. Some will interpret it correctly, and some will break. You'll have to get reference implementations of every "supported" YAML parser and run every config you have through them all, and diff them all, before you can trust them.
YAML may be easier to read than JSON, but its added complexity (the parser is significantly more complicated) and obtuse "features" are just not worth the effort. Not to mention, have you ever tried to maintain a very large indented YAML file by hand? Pain in the ass. Just shove everything into JSON files. The fact that it's so limiting is freeing, and everything can parse it. But don't edit it by hand.
And IMNSHO, you shouldn't use either YAML or JSON as a configuration language. They are for data structures, not configuration. If you want a configuration language, go get something designed as a configuration language.
I’ve been using python enums to store static, non sensitive config lately. Let’s me store my data in dict/JSON like format while being able to write comments. Plus no need to do any IO to access variables! However, not really sure if this is was the intended use case for python enums.
A lot of JS tools now will just take a js file that exports a configuration object (`.prettierrc.js`, `.eslintrc.js`, `.babelrc.js`). I find it very sensible.
- Allows code reuse.
- Allows configuration to be as dynamic as you want.
- Can use environment variables.
I suppose there are some cases where you can't trust the user in this way (running configuration code), but I think in a lot of cases you can, and it's generally more convenient.
For a sensible language, I would model something after Apache's. Simple, direct, easy to read, easy to write, easy to extend. It's like a server admin who barely knew HTML 1.0 wrote a config format. Perfect for the things it should actually be doing.
Another option is to take a simple format and extend it with another format or language. For example, you could add SQL to a simple file format, and suddenly tons of people can extend the config with some complex logic. But I also think templating and macro languages should generally die in a fire.
INI files aren't bad. They aren't a language, but they are good for simple use cases and a flat structure. Yes, you can have hierarchical section names, but it's a pain. If you want to use INI, you should probably use TOML. But there's very little incentive to add a TOML parser to a simple app when they could just suck in a JSON file. (Personally, I use JSON files, but only because I'm lazy, not because it's a good idea)
The biggest problem with things like Ansible is they'll give you enough rope to hang yourself. First you get defeated by whitespace. Then you get defeated by the stupid YAML rules. Then you get defeated by complexity like inheritance, namespace conflicts, and the shittiest debugging output ever. Then you get Jinja madness embedded inside Ansible madness inside YAML madness, and nobody knows how it works and can even touch it for fear of breaking everything. And of course, there is nothing that can parse it other than Ansible.
I think if Ansible had been TOML+Jinja it would have worked. It would have been ugly and clunky, but it would have worked. (The engine itself and their stupid rules about structuring your project should also die in a fire, but that's a different subject)
No body talks about SDLang (Simple Declarative Language) : https://sdlang.org/
An example :
```
// This is a node with a single string value
title "Hello, World"
// Multiple values are supported, too
bookmarks 12 15 188 1234
// Nodes can have attributes
author "Peter Parker" email="peter@example.org" active=true
// Nodes can be arbitrarily nested
contents {
section "First section" {
paragraph "This is the first paragraph"
paragraph "This is the second paragraph"
}
}
// Anonymous nodes are supported
"This text is the value of an anonymous node!"
// This makes things like matrix definitions very convenient
matrix {
1 0 0
0 1 0
0 0 1
}
This is like XML attributes, which I've always found annoying to deal with in programs. It doesn't really map to any native data structure in most (all?) programming languages, so you need a special class/struct which supports it.
Simply using something that maps directly to a hash map/object/associative array would be much better, IMHO.
Other than that, it looks like an interesting project.
Actually it is even a superset of XML, from the docs...
SDL documents are made up of Tags. A Tag contains
* a name (if not present, the name "content" is used)
* a namespace (optional)
* 0 or more values (optional)
* 0 or more attributes (optional)
* 0 or more children (optional)
So it's like an XML node, but the `0 or more values` means it has a list/array for a "body".
I'm gonna continue using YAML, like, even if each parser came with support for a halt-and-catch-fire directive that you couldn't turn off or whatever. It's just about the only markup language where you can embed multiline strings without the indentation being fucked either in the markup or in the resulting string, without requiring lots of escaping.
I am sad that EDN hasn’t achieved popularity as a format. It seems like a better specified, less verbose format. As a bonus it plays well with Paredit-like editor modes. Alas, the curse of being better, but later.
We've spent like 10 years trying to fill in gaps left when we all decided to hate XML. JSON is great as a lightweight DIF between trusted partners. If you care about maintenance and safety, XML with XSD is rock solid.
I don't know, XML is awfully verbose and the schemas are even more verbose. I've lost track of how many "XML" configuration files that looked like this:
A few weeks ago, I had about 100 config files (tomcat context.xml) which all needed fixes for common misconfigurations - if they had the misconfigurations in the first place The kind of problem that is just a little bit too hard for search and replace. It was really easy with xslt. The result had all comments preserved. I choose to reformat the files, but keeping whitespace was an option too. Now tell me if you can do that with json,yaml,toml. Most parses simply forget about the comments to begin with.
In the same way, if we receive a data transfer in XML and there is a schema, simple validation catches a lot of problems quickly. You'd be surprised how many times a company gives you a schema and then sends you xml which doesn't validate. In JSON, you have to write a program to get even basic validation.
Don't get me wrong: XML has problems, some inherited from html/sgml (entities!), and even more after serious abuse by consultants, archicture astronauts and enterprise vendors (SOAP! namespace overuse! 10 XML parsers in 1 app!). But it was also miles better than what came before and I feel the XML hatred pendulum has swung too far.
Today, JSON is in vogue, and I've seen enough IT to not swim against the tide. It is a reasonable solution for problems caused by XML abuse. Besides, there is value in going with the majority,even if it only fixes 80% of your problem. But I can only weep for the miserable date, numeric and comment support, and their endless stream of incompatible workarounds.
For your parameter example: You can't both strictly validate and have full freedom at the same time. Something has to give a bit. Some less horrible alternatives I've seen:
<parameter name="X" value="Y"/>
<subsystem name1="value1" name2=value2 ... /> , add newline for each attribute
<name>value</name>
What problems wrt entities does XML have that it has inherited from SGML and HTML? Do you mean entity expansion attacks such as million laughs? HTML has only character rather than general entity references, and SGML has had the ENTLVL capacity to bound entity reference nesting since the year 1986.
Edit: XML is just a proper subset of SGML by definition, hence it didn't introduce a single thing that wasn't there before. It only introduced XML-style empty elements and DTD-less markup, and SGML was extended in lockstep with XML to support these as well
I'd consider user definable entities a problem, as you can't read a file without knowing the DTD. Million laughs is just a very ugly bonus.
XML is more than the part inherited from SGML, it's also the XML culture surrounding it. Namespaces are an example of something that created an XML dialect. And of course SOAP, which actually needs the WS-I standard to explain what parts of the WS-* standards to use or ignore, and how to interprete them. And even then 2 WS-I stacks will rarely interop without trouble. Lets not blame SGML for that monstrosity
What's wrong with namespaces? These are simply globally unique identifiers that allow us to define and use our own globally unique names and make them passably human-readable. That is:
<a:log /><b:log /><c:log />
can mean a math function, a text file that records what's happening, and a cut-off trunk of a tree and there will be no confusion whatsoever.
I would definitely use RelaxNG for specifying schemas instead of XSD - it's simpler both to read and write in every case I've tried, and the resulting schema is smaller as well, often by a lot.
To be quite honest I dislike using XML for human editable configuration files. Variations on Microsoft's .ini files (such as TOML) seem to work best for that, IMHO.
We use .ini files for all of our settings in our products, and they work great most of the time. The only weirdness creeps in when you try to store things with embedded CRLFs and need to escape/unescape them (not a big deal), and storing lists of things is a little difficult.
JSON is great in terms of flexibility, but .INI files are really easy to read because everything is on the left side of the screen/window at all times.
XML is in that odd middle-ground where it's usually human-readable, but also a huge pain in the ass to write. It's great at what it was intended for, as a data interchange format.
For a general-purpose human-writable structured data format, I guess the ugly nonstandard hack that is "JSON with comments" is probably good. It's certainly faster to parse than YAML.
XML wasn't intended as data interchange format, but for replacing SGML as serialization and markup meta-language on the Web (eg. for XHTML, SVG, MathML). It can't be said often enough that markup languages are for authoring and delivering semistructured text data, not for general-purpose data serialization. As in, editing plain text files and have your text treated as content unless marked up with markup and annotated with metadata attributes.
Though this is much more pronounced in SGML which also contains the features for authoring (as opposed to delivery) omitted from XML such as tag omission/inference, custom Wiki syntaxes, and other short forms.
Well, what is the purpose of markup languages? Isn't the sole purpose of markup is to be able to process the marked-up content with a computer? Why would you add markup to your favorite verse if it wasn't to somehow feed it to machine for some purpose (analyze, typeset, etc.)?
So when we have text with markup the text part is meant to be there for humans and the markup part is solely for computers. Now let's remove all text; now there's no content for humans at all, only for computers. How is this different from general-purpose data serialization?
(Some of the samples you give, like SVG, may not have any text content at all; it's basically a drawing language.)
Given that XML ecosystem has quite a few tools (e.g. several type description languages or a declarative data transformation language just to name a few) it's a very good general-purpose data serialization format.
I've managed to teach non-programmers to successfully edit YAML files without too much trouble, but most non-programmers have a really hard time consistently producing valid JSON by hand.
As much as there is a lot to not like about YAML, it is the easiest one for humans to consistently write in my experience.
> It's great at what it was intended for, as a data interchange format.
XML's incredible verbosity is a problem for computers too. I've spent time performance-tuning message parsing code that had no good reason to be slow except that our use of XML bloated the data and decoding time by an order of magnitude or more compared to a binary protocol with a schema.
In my experience if you are using XML as a data interchange format and it's slow it's probably because you are using a DOM parser instead of a SAX parser. DOM parsers build a tree that is best used describing a marked up document and much less useful for describing a data structure you might want to serialize or deserialize.
I've gotten incredible speedups just by switching to SAX parsing in those cases.
My biggest complaint with JSON is the lack of support for comments. For that reason, it's hard to take it seriously for human-maintained configurations.
Comments and trailing commas. If those two features were added, I would use JSON for configuring everything. Naked keys would be a distant third. My conclusion is to use TOML or the protocol buffer text format.
I've been slowly ripping out YAML support and converting configurations to TOML.
Not just trailing commas, but the need for commas at all when there is a newline right next to it has been a source of many stupid issues for me when less knowledgeable/experienced people edit json based conf files.
Then there is floats without a leading zero. Missing colon after the key. And yea, naked keys. The need to wrap the entire file in { } or [ ] is just icing.
Honestly I feel the most bare simple conf format of [first-word] [rest-of-line] is enough for many programs that end up using but never taking advantage of more powerful formats.
though this looks terrible and there's probably some edge case I've forgotten. (Also it misses the point of JSON in that it's no longer valid JS. I don't know whether that's important anymore since you should be calling JSON.parse() not eval() anyway.)
The one major reason I could see to use "JSON" as a conf file is in trusted node.js apps because you can then easily embed functions and logic in them if you need more advanced/customizable configurations. And you can do it with full syntax highlighting in your editor. And comments, and trailing commas, and naked keys.
Of course this is no longer JSON, it's straight up Javascript config files. But it has come in handy a few times when I want to override standard behavior on a per config basis, and most of the file is still just plain key: val
I get a lot of flak for this, but there are definitely times I miss XML for certain things and find it way easier to work with than JSON or YAML. I definitely understand some of the backlash against XML that happened a decade or so ago and definitely don't want to return to the days of half of a Java application being XML code.
I think JSON is more efficient to write, but XML often ends up being more efficient to read due to comments and the fact that XML tags often give you better context. I think most programmers (myself included) tend to heavily optimize towards writability when we should think about readability a little more.
An example of this is ElasticSearch, where your queries are in JSON and often end up tons of levels deep - it is super easy to get lost in a sea of closing brackets, whereas XML would let you add comments in and the fact that closing tags have names in them would give you better context about what you were doing.
> This problem was solved in the 60's with S-Expressions.
Not so much. Sexps don't provide a place hang "extra" information. It's been a pain point. While some lisps allowed decorating runtime things (eg objects with attributes, and symbols with property lists), their printed/readable representations were implementation dependent.
There's also a widespread misconception that Scheme is easy to parse. Numbers and all. It's actually very hard to get right. Real scheme parsers are quite large and hairy.
> XML was a complete and utter waste of time.
While XML was ghastly, there was an unmet need. There still is.
Yeah... I'm a diehard Common Lisp user, and when I saw YAML+go-template used for Kubernetes Helm templates, with some extra hacks to take care of indentation shifts... I felt almost physical pain.
If your configuration file is so long it's unreadable in YAML, then maybe you need to break it up into more than one file? I can't imagine any syntax would be easy to read once you reach more than 100 or so lines.
Do any configuration file languages support type hinting? Adding (int) in front of a YAML key would be easy enough to read, and would keep some of the confusion at bay.
As a general rule of thumb: Never use yet another non-markup language designed by people who claimed to be designing yet another markup language from the very outset, then after somebody awkwardly pointed out that what they'd designed wasn't actually a markup language, they invent a backronym to contradict that embarrassing historical fact.
It just makes me wonder what the hell they thought they were doing all that time...
It's like designing a tool called YACC, and ending up with Yet Another Interpreter Interpreter!
It's like a standard for storing all your pornography in a folder called "Definitely Not Pornography".
>Originally YAML was said to mean Yet Another Markup Language, referencing its purpose as a markup language with the yet another construct, but it was then repurposed as YAML Ain't Markup Language, a recursive acronym, to distinguish its purpose as data-oriented, rather than document markup.
> It just makes me wonder what the hell they thought they were doing all that time...
The YAML project was a convergence of several different efforts at information representation including people from Perl, Python and Ruby, each with our own ideas. I happened to be involved in the outer ring of the XML community, in particular a group SML-DEV where we were looking for a better information model more suitable to data serialization that would use an XML compatible syntax.
At that time, especially since serializing data with XML was all the rage, "ML" or "Markup Language" was commonly associated with data serialization. In fact XML is very inconvenient for actual markup, even though it derives from SGML (of which HTML is an example).
The "YA" part did deliberately come from YACC, the reason why is that XML (and we hoped YAML) would be the basis of domain specific serialization languages. Hence, you can think of it as a meta-language for building sub-languages, like the application I was working on, a serialization for accounting data.
Hence, that's the origin of YAML acronym, which happened to have a domain name open. In fact, you can look at archive.org starting in 2001 and you'll see the 1st public pass of YAML more closely followed XML like bracket syntax. I disliked it, but, it was what came out of the SML-DEV collaborations. Even so, the important thing was the information model, which was a simple typed graph and not an element tree.
The syntax evolved with many revisions after the model and goal were set, with lots of feedback to address concerns and usability tests with domain experts. The syntax got more and more lightweight, inspired from RFC0822 (email) while adding dashes for list items. Testing the syntax with domain experts, e.g. accountants and the like, was exceptionally important part of the process. Users liked this serialization syntax since it made their data "pop".
So. A year or so passes while we focus on getting things to work and helping people. Then, the product differentiation question comes up. Since XML had the dominant position in the data serialization mind share, how does YAML compare? Well, the YAML model was a typed graph, while XML was an element tree. One required no special libraries to manipulate, the other require a DOM to translate. But perhaps more importantly, it's because XML borrowed its model and tags from SGML and SGML was a true "markup language". So, XML was a "markup language" where it was impractical to do actual markup. Then, it dawned on us, well, of course a data serialization language isn't markup problem. I'm not sure any of this was obvious at the time.
Anyway, in a very fun chat, Oren pointed this out and then Ingy said: well then, YAML Ain't Markup Language! So, the new name actually represented what we had set out to do in the first place. Further, I would suggest that the industry understanding of how serialization languages are poorly supported by markup approaches (XML) is at least somewhat due to our name change and fun filled articulation at conferences.
For config formats I'm finding HCL[1] to be nice for my use cases. It has comments, no requirement for double quoted identifiers, and is actually simple. The main issue was the only implementation is in Go, so I had to write a port to C++.
Super happy with HCL for over a year in production now.
I've only got one FOSS project using HCL but I think of its' bundled HCL config file is an attractive part of its UX: https://github.com/LukeB42/psyrcd
I continue to hold a firm belief that the reason JSON is so popular is that it covers most use cases without any of the dumb crap that hides in YAML and XML behavior.
Lack of comments, (nice) multiline strings and trailing-comma support make JSON a real pain for config in practice.
I've started using YAML parsers for all of our (once) JSON config files, just to get those features (while preserving the curly-braces, commas, and other JSON-isms). Yes it allows a misguided dev to abuse YAML (mis-)features, but a combination of coding standards and linters can fix that.
Being a super-set of JSON is YAML's best feature.
I would never consider it for untrusted input though.
Over the past few months, I've built up a somewhat masochistic relationship with YAML, as I've been writing my own JS library for it [1]. Yes, the spec is more complicated than it ought to be and yes, writing yet another implementation might just mean more overall variance within the spec, but it's still the only config language with decent usage that supports human-readable multi-line strings and comments. And it would've been really nice if someone else had supported editing comments, so I wouldn't have needed to do that myself.
Still, I'm optimistic, especially now that Prettier is getting YAML support. [2]
I agree with some of the author's points but the "surprising behaviour" section is odd. For example, why would you expect `3.5.3` to be parsed as a number? How could that be parsed as a number?
> JSON should actually have the same issue. When I enter { 013: "11" } in the web console I get '{11: "11"}'.
JSON doesn't have this issue. `{ 013: "11" }` isn't valid according to the JSON spec for a couple reasons, the important one being that multi-digit numbers cannot start with zero[1]. Try this in the console: `JSON.parse('{"11": 013}')`.
{ 013: "11" } isn't valid JSON. All object keys in JSON must be strings. And if you try to do that in value position, it's still not a problem because JSON never treats unquoted text as a string.
Also JSON doesn't treat a 0 prefix on a number as special. There are no octal (or hex) literals in JSON. In fact, a JSON number literal cannot even start with 0 unless that's the only digit (before the period), e.g. 013 is not a valid numeric literal in JSON.
Thanks. Yeah, I had mistakenly the believe that JSON is mostly a Javascript hash, not that 'JSON is valid Javascript' which means it's a subset of all possible Javascript hashes.
Every time this comes up, I don't understand how https://json5.org isn't superior to YAML in every way.
If there was some deficiency with JSON5, just simply use JSON with comments. It's that simple.
JSON is one of the best things to ever come out of the CS disciplines.
For those that whine about comments in JSON, Douglas Crockford, the creator of JSON, himself said to do it.
>Suppose you are using JSON to keep configuration files, which you would like to annotate. Go ahead and insert all the comments you like. Then pipe it through JSMin before handing it to your JSON parser.
I personally dislike with a passion every language or grammar that depends on white-space identation, especially if the designers were extremely opinioned to the degree that you can only use spaces (or even, a specific number of spaces per identation level) and not tabs.
It's not as a big of a deal in Python because as others have mentioned, you usually don't end up writing large functions to begin with, and tab identation is supported. It still means that if you try to send a code snippet to someone over email or Slack, IM, etc it may not work because whitepsace may be trimmed etc. With YAML, it's way worse for reasons the author outlined.
Something may look good on paper (or in screenshots) but practical considerations need to factored in when designing a grammar, and there are myriads pretty-formatting utilities that could be used to that end if one cares for that sort of thing (see also: clang-format).
I don't love YAML, but for configurations I always choose it for the simple fact that it supports comments. Comments for json are almost always hacky (for example, imbedding a comment key inside the value).
I write a lot of command line tools for work and need configuration files and always go for YAML, but am never happy about it. I use it because it serializes to Ruby objects so I can more easily do validation and check if someone forgot an important key but I wish there were something that would make that easier for the people using the config files. I looked into TOML after this and liked what I saw.
I've been playing with Home Assistant[0] recently and, as a result, getting exposed to YAML. I don't find it pleasant to work with at all. I'm sure a big part of that is how Home Assistant uses YAML for automation stuff[1] that would probably be better served by a real programming language.
I think a big part of the unpleasantness comes from how non-obvious some things are. For example, why does the first automation example require a "-" in front of "platform" (under "trigger"), but the second doesn't? The comments explain when it's required, but not why? Shouldn't the parser be able to figure it out from the significant whitespace?
I found it so unpleasant that I'm using Node-RED[2] to do anything even vaguely automation related and have relegated Home Assistant to being the UI and communications abstraction layer.
In contrast, XML, while overly verbose, has a more reasonable structure. I haven't played with JSON much, but it also seems pretty reasonable.
Everything that you can do in YAML + more, you can do in Home Assistant using Python. Have a look at my PyCon talk [1] from 2 years ago or check the available functions in the docs [2].
About the dashes. In Home Assistant when an option takes a list, you can omit the list if you are just passing in a single entry.
I agree. This is my biggest issue with YAML - the syntax is just so weird and unintuitive. Compare arrays in YAML to JSON. YAML makes no sense.
Then there is the significant whitespace, difficult to remember quoting rules, etc.
I think the only reason it had gained any traction is because it is relatively easy to write multi-line strings in, so it is good for shell scripts, e.g. in CI configuration.
Someone should really come up with a sane alternative that works well for that use case though.
My friends who are smarter than me have these criticisms.
XML -> doesn't record what programs think of as 'data'. So a program needs to convert it's data representation to XML and back again. Usually there is no formal spec. This is the exact same issue you have with databases. But at least a database has a formal representation and data types.
Part of SOAP is Microsoft trying to bolt schema's onto XML. SOAP seems to generate a lot of unhappy programmer noises. But my ex roommate said, 'well when you get it working it works'
JSON and YAML do but they don't have schema's to parse against. So programs need to do their own validation.
> Loading a user-provided (untrusted) YAML string needs careful consideration.
Why would you ever use YAML for user-provided input? At that point, it's better to just use JSON.
> Many other languages (including Ruby and PHP1) are also unsafe by default. Searching for yaml.load on GitHub gives a whopping 2.8 million results. yaml.safe_load only gives 26,000 results.
Maybe that's everyone's using JSON where it would be unsafe to use YAML.
> YAML files can be hard to edit, and this difficulty grows fast as the file gets larger.
And... this isn't the case for XML or JSON?
Ok, so reindenting a section might be a pain, but if your YAML is containing large amounts of data, maybe that data doesn't belong in that format if you're manually editing the YAML.
> especially since 2-space indentation is the norm and tab indentation is forbidden
Good. ;)
> And accidentally getting the indentation wrong often isn’t an error; it will often just deserialize to something you didn’t intend. Happy debugging!
Which is unlikely to happen if you're using a YAML library or only editing small-ish config files by hand.
---
As noted, YAML has a lot of quirks. As a configuration language, I love it and am used to the little edge cases. Could it be better? Definitely. But I would still consider YAML to be great in the domain where it excels: human-readable configuration. Using it to store and transmit large amounts of data, especially in ways where a human is manually editing the YAML, is a terrible idea.
Since we're all chiming in with configuration formats, here's two more nice ones:
1. If you're writing Lisp, just read S-expressions in.
2. Use Python or Skylark and have a step that executes it into a configuration format. Obviously this is not something you would want to use for a data interchange format, but no one thinks that they can blindly run a random hunk of untrusted Python. Right? ...Right?
One of the important things implementations failed to do was properly support Schemas (http://yaml.org/spec/1.2/spec.html#Schema). And this is still the case today. Had they done so, the safe load issue would never have arose, and the loaders/dumpers would be properly configurable to suit the needs of the application, e.g. don't support "Yes" and "No" as `true` and `false`.
I once did about 98% of the work to support Schemas properly in Psych (https://github.com/ruby/psych) but the maintainer said he didn't want to "maintain it".
So, there you go. What else can one do? You can't blame the spec for decisions of implementors.
(That's not to say the YAML spec couldn't use some improvements, but it's far from "not so great".)
Yes, I love YAML in general. The complaints about big files can be alleviated with a good editor that collapses blocks and draws indentation guides.
But the author is mostly right, when adding support for YAML to my code I spend a lot of time disabling all of its nifty misfeatures. Wish it was simply an indented JSON with comments and fewer quotes.
We recently published a paper suggesting an alternative to HDF5 [1] using directories for objects, YAML for metadata and NumPy for data. Many of the points in this article were raised by the reviwers or were worries we had about choosing YAML as the metadata format. In the end, we decided to use a subset of YAML with only basic tags, enforced quoted strings, no directives, and no block scalar styles (fancy multiline strings). So far it has worked out great. I hope it will make the format easier to understand for users and make it possible to write faster parsers in the future.
YAML and TOML both seem too complicated. Automatic date parsing? So many different ways to specify the same nested hash table? I like json because there's usually one obvious way to do what you want. It's a local minimum, like C and Lisp. It's really too bad about the comments.
> Don’t get me wrong, it’s not like YAML is absolutely terrible – it’s certainly not as problematic as using JSON – but it’s not exactly great either.
Well at least it's the least worst then, do that make it the best?
Frankly I hope the future will be make with indented languages. Curly braces languages often allows too much liberty, and it's annoying. The fact that go enforce the curly brace style is really the tipping point of curly brace languages.
Granted there need to be some good compromise for ambiguous details when parsing an indented syntax, but readability matters much more than anything to me.
It solves some of the issues that you can have with Python, while at the same time also avoiding the whole nonsense with the braces. I find it's a good trade-off between the strengths of both approaches.
If the problem YAML is size / formatting, that's something a good IDE can simplify even more. For loading up configs that would otherwise be properties files, I have found it to be quite clean. If you think of it a domain language for configuration, YAML is great. It can be understood pretty much intuitively, and it does some things quite well. On the other hand, when it comes to exchanging data, JSON > XML > X12 EDI... EDI? Yes, it's still a thing... a terrible thing...
I recently posted about a technique I've been developing (and am very happy with) for representing and editing JSON in spreadsheets, without any sigils, tabs, quoting, escaping or trailing comma problems, but with comments, rich formatting, formulas, and leveraging the full power of the spreadsheet.
>Recently I've been working on kind of the converse of this problem with JSON and spreadsheets, and I'll briefly describe it here (and I'll be glad to share the code), in the hopes of getting some feedback and criticism:
>How can you conveniently and compactly represent, view and edit JSON in spreadsheets, using the grid instead of so much punctuation?
>The goal is to be able to easily edit JSON data in any spreadsheet, copy and paste grids of JSON around as TSV files (the format that Google Sheets puts on your clipboard), and efficiently import and export those spreadsheets as JSON.
>[...]
Since I wrote that post, I've cleaned up and refactored the code into a portable little library that will run in the browser, or inside of Google Sheets:
I haven't come up with a trendy marketing name for it yet (except for the source file name sheet.js), because I think it's important to first discover what it is by using and refining it for a while, writing some documentation to explain it, and getting feedback from other people (work in progress), before trying to name it -- otherwise you might end up calling it "Yet Another <something it's not>".
Not agreeing or disagreeing, but I read both this and his other post about JSON as configuration file and I have not seen him propose and argue for an alternative.
In your analogy, the alternative is to not see the movie. The analogous alternative would be to... what? Not use config files? Doesn't seem like much of an alternative to me.
This is almost always the case with YAML criticism, which is a perennial topic in programming communities. Coming up with "a better YAML" is easy; getting anyone to use it is hard - even harder than getting people to use a new programming language because configuration files have to be touched by end users, and they all know YAML already.
Furthermore, all of the "better YAMLs" that exist solve a different subset of issues based on the whims of the author. I like indentation-based syntax for config files (though not for programming languages, go figure), so half the alternatives look worse rather than better to me, and reasonable people can also disagree on things like when strings should require quotes and what should be a valid hash key and so on. There are so many bikes to shed that I don't see us ever settling on an alternative without major buy-in from one of the big players in tech.
Until then, I'm happy to let YAML win. It's just not broken enough for me to get worked up about.
I wrote this article, and at this point I don't really have a strong preference for any one format. I think a lot depends on what it's used for.
YAML can actually be pretty useful. For example, recently I wrote a tool to generate OpenAPI/Swagger files from Go source code, and outputting that as YAML works pretty well, as YAML is quite easy to read for that.
YAML can also be a good choice to serialize some things to disk, like program state. JSON can also be a good choice for that, as can TOML.
But for other things ... it's not so great. I should probably write an article detailing this at some point, but ...
$ ls -1 /data/code/arp242.net/_drafts | wc -l
64
So much stuff I need to finish :-( I've added your suggestion there though!
JSON is slightly easier since you have actual start/stop marks in the form of `{`, `}`, instead of relying on 2-space indentation. (are there 8 or 10 spaces there? Hard to see).
The alternative of writing executable code as your config language also has it's own issues. It's why things like skylark was invented for buck & bazel:
You wouldn't, but the language is lightweight enough (according to Wikipedia, the interpreter is ~180kb compiled) that including it as a dependency probably won't matter. It's less overhead than an interpreter for JSON or YAML at least.
>Is a parser library available for mainstream languages?
It's used commonly in game development so yes for C and C++, and the site[0] mentions "Java, C#, Smalltalk, Fortran, Ada, Erlang, and even in other scripting languages, such as Perl and Ruby" as well.
I know Lua and I use it sometimes (for example in Redis). But I don't want to include a full Lua interpreter in my program just to parse a configuration file. It's overkill in my opinion :-)
We need to standardize cson (https://github.com/bevry/cson) and build a compliant C parser. The format is simpler with fewer surprises as YAML, yet able to handle more types of syntaxes in a straightforward form.
I came in here to rant that YAML isn't for the same purpose as JSON and XML but then thought I'd better RTFA and realised that oh yeah one or two good points here.
The first criticism, where he's embedding exectutable code in YAML I kind of have to agree with - that seems crazy. I don't know why YAML would support this.
The remaining criticisms seem to relate to (a) the spec overreaching in terms of complexity and (b) differences in implementation, which I guess is some kind of an extension of (a).
I maintain however that YAML, JSON and XML are different.
If you want to make me feel cross and insulted give me a JSON file to edit. I think JSON is probably the best commonly used format for M2M and storage serialisation.
I wouldn't want you to use YAML for that though. There's two many different ways to do it and any kind of ambiguity never makes for good M2M.
For configuration-files it's great though, as long as you stay away from some of the more exotic features I suppose. It effectively provides a "user interface" of sorts by which your users can specify non-trivial configuration details.
The complaints about overlong and overcomplex yaml files could be extended to other commonly used formats.
With regards to XML, I'd say that YAML provides all the features you'd want to use from that format in a format that's easier to hand-edit as text. XML is "okay" for M2M but probably better for document storage where you have some kind of custom editor.
It's horses for courses. I wouldn't want to use YAML where I'd want to use JSON, or use XML where I'd use either of them, and neither of these have the semantic richness of XML either and so wouldn't be appropriate in whatever spaces XML should be used (which is far more limited I suspect than its current span of applications).
Ultimately when each is used to its strengths they're not interchangeable formats.
I haven't found a markup language that I really prefer for configuration files, but JSON has been reasonably nice to me in NPM configurations and VS Code settings and I haven't ran into problems yet. I understand the motivation for TOML but it has the same ambiguity problems that YAML does. There's something to be said for not being too flexible, or it'll fall into the same trap as AppleScript did. Humans might prefer the convenience of ultra-flexibility at first, but sooner or later our intuition will just not match up with the actual rules, and we'll have to spend longer than we wanted to looking up arcane syntax documentation. That's been my experience with YAML anyway, like every single time.
> One might also argue that fixing it is as easy as replacing load() with safe_load(), but many people are unaware of the problem, and even if you’re aware of it, it’s one of those things that can be easy to forget. It’s pretty bad API design.
It is. At API design time, it would have been trivial to replace them with `load()` (which does the same as `load_safe()` now) and `unsafe_load()` (which does the same as `load()` now) and probably avoid this pitfall altogether. Now? Much more difficult to solve.
For what it's meant to do (conscisely express data in a human maintainable way) it's still in a league of it's own. For configuring your apps, it's the way to go.
I think this is a problem of trying to serve all use cases at once. I mean, if you're making a data exchange format, do you really need it to be able to execute arbitrary code? Isn't that inviting trouble? It's like building a bank vault and then cutting a large hole in it and putting a plywood door on it - just in case somebody would want to convert it to a restaurant later. Maybe it would be better not to serve that particular use case at all?
I'm not saying YAML is perfect or terrible (just like the author), however most of the examples the author gives are going to be the same for any language attempting to implicitly type values.
I think maybe YAML just went a bit too far: because everyone hates defining key names with quotes when it's unnecessary 99% of the time, but it would have been enough to relax that rather than go all the way to suggesting relaxing all quotes and then attempting to infer value types.
Our shop are heavy users of YAML, and we've sort of backed our way into a restricted subset of YAML. Some of them are config files, but others function closer to DSLs.
I have not yet taken a look at strictyaml, but after years of use the spec definitely needs YAML, The Good Parts Treatment.
One thing the author did not mention was how slow the out of box the Python YAML parser can be. This can be sped up with a call to libyaml, but then you lose the safe_load method.
I suspect the speed issues are more a result of implementation details than of being in Python. Last year, I created a config language for my own use that supports some syntax very similar to YAML. My pure Python library can load simple dict/list/string data 10x as fast as PyYAML, and nearly within 1.5x the speed of libyaml (https://bespon.org/#benchmarks). That's while building an AST with source information to allow round-tripping and supporting my own version of anchors and tags, so there's significant room for improvement. I expect that a pure Python YAML library might be able to match or beat the current performance of libyaml in at least some cases, particularly for a restricted subset of YAML.
I did a few rudimentary benchmarking tests on in house real world data sets and strictyaml was approx 10X slower than PyYAML's yaml.load (w/ out calling out to libyaml) and yaml.safe_load.
I used the basic strictyaml.load function, without any schemas.
README page said speed is not a current priority, and that appears to be true.
Most of these are Microsoft's fault: MS-DOS allows only 3 characters for the file extension, so the file extensions had to be abbreviated.
As for C++, I'd also blame Microsoft. The plus character (+) is a reserved character for MS-DOS, so the obvious extension ".c++" couldn't be used (nor could the case-sensitive ".C" extension). So people either toppled their plus signs (".c++" becomes ".cxx"), or replaced them by the first letter of "plus" (".c++" becomes ".cpp"), or treated them as a repetition sign (".c++" becomes ".cc").
I got blindsided by another obscure yml phrasing rule. Sometimes people couldn't pay online using our service and I couldn't figure out why. I tested the hell out of the payment module on our dev and UAT environments, but I just couldn't reproduce the issue.
Eventually I tracked it down to an ID inside a yml file. Turns out the live environment was running in 32 bit mode which interpenetrated the number as a string.
I hope YAML Implementers will read this and give thought to adding proper support for Schemas (http://yaml.org/spec/1.2/spec.html#Schema) -- they should be customizable by the application. This would resolve a number of complaints and improve interoperability.
We used YAML from the start in Wrench tool for writing reftests for WebRender[1]. Currently looking into a prospect of migrating all of them to RON[2] as a better alternative, which has proven itself useful in WebRender captures.
Nothing is perfect and critiques always welcome. Will add the article to Awesome YAML - a Collection of Awesome YAML (Ain't Markup Language) Goodies for Structured (Meta) Data in Text [https://github.com/datatxt/awseome-yaml]
Honestly, I can't stand YAML. Every time I need to read a YAML file I also need to bring up a reference for what the syntax is because to me, none of it is immediately obvious.
Personally, I don't understand why all these projects have defaulted to using such an esoteric markup language.
> it seems to be the case that the majority of libraries are unsafe by default (especially the dynamic languages), so de-facto it is a problem with YAML.
It may seem that way, but it's not. Re: cryptographic functions
I've used YAML for years, and yeah, there's a lot of warts, but I love it for small(ish) manually edited configuration files.
I generally keep it simple and it works.
Those reasons are exactly why I created mset. It's not as flexible as Yaml but it is dead simple and works for many use cases. It is also very easy to implement in any language and more importantly super simple to learn for end users.
Seems like YAML tried too hard to be predictive of intent. I never got into YAML myself simply because it seemed like "JSON, but less ubiquitous and more hassle to find libraries that support it"
> difficult to pull off and requires productization, in other words not low-level tooling in a text file.
Is this supposed to be a feature? One of the great things about simple config files is that you can use standard GNU tools to view edit, and diff them, you can put them in source control, you can be sure that you can edit them on a remote server no matter what's installed, etc.
Eliminating all those benefits would require an extraordinary jump in functionality as a tradeoff, a jump in functionality that most things frankly don't need.
You're right regarding the value of text files. ConfigNode is really for management of configurations for use cases where you can have huge JSON/YAML files and you want to be able to manage them, facilitate dynamism (often using templates)...support collaboration..etc...it is a gold-plated solution and not necessarily suitable for simpler needs.
Here's another example of ConfigNode used to manage Akamai configurations:
I worry that text files promote local minimums. Like, the standard GNU tools or whatever editor you use is "good enough" for what you expect to do with them, but there exist better tools for your specific task that you never even look for.
With a binary file, you of course have to be using something specific to work with them. And if lots of people were doing that, that would in turn drive a lot of specific, good options to choose from. The tools would be better at their specific task. They could provide actual _user interfaces._
But there's not enough standardization in user interfaces for to reduce the cost of relearning each tool, and the tools would need both an interface and an API to automate them (we don't tend to get both for free), and we don't have anything great for chaining together APIs. So text files it is, which kind of provide these things, but they don't extend too well.
The trend in computing tools is to slowly invent what you could get easily with more specific formats... using text files. Automatic formatters, so you can pretend your project files are really the AST you care about. Smart IDEs with autocompletion, because you're not really typing arbitrary characters. IDEs that will collapse a lot of unnecessary information for you, like showing only the first snippet of JSDoc. Type systems that show you what's available and what you can plug together in a sane way. Version control that pretends it knows how to solve the problem of diffing/merging.
The answer is a marketing website with market-speak all over the page and 'Executable not found "/eula"' when trying to read Terms and Conditions?
Edit: I actually did download whatever this thing is. What is this thing? The README is Jetty's README. There are dozens of dirs with crap^W code in them.
I am afraid your own cynicism (warranted or not) might be really blinding you here...I am not sure whether to feel proud of myself when my writing is designated "marketing speak" :)
The documentation link LITERALLY shows how to use the product.
"Solvent is an integrated platform that combines an application container (jetty), a middle-ware and a developer environment to provide a complete solution for delivering web applications."
ConfigNode is something you'll put on a server as a config management environment, the output can be json/yaml/xml..etc
I think you might find it quite unique if you just give it a chance (no marketing) :)
> Solvent is an integrated platform that combines an application container (jetty), a middle-ware and a developer environment to provide a complete solution for delivering web applications.
That link is pretty terse. I have no idea what "object graphs" are in this context nor how they solve the "endless iteration on the right config format" problem. Moreover, churn on config file formats is probably the least of my dev problems.
YAML and JSON already represent object graphs. This appears to be essentially a config file editor, not a superior configuration format. It has a GUI for creating or editing a config, and then it outputs YAML or JSON or XML.
It can be hard to fully convey how ConfigNode works without actually showing it in action. Yes it does give you a UI environment to edit the object graph and outputs YAML/JSON/XML..etc.
However the output comes after evaluating the object graph. In other words it doesn't just reassemble a bunch of static values but rather actually executes objects (think POJOs) to produce the fields that make up each object.
It's true that you need to remove cycles before you can create a tree, but acyclic graphs are still graphs (and most relevantly they're probably the kind of graph you want your config to be). And YAML actually can contain cycles, so even if you only want to consider cyclic graphs, YAML still qualifies.
A second thing to consider... YAML was created before it was common that tech companies actively contributed to open source development. There are lots of things we could have done differently if we had more than a few hours per week... even a tiny bit of financial support would have helped.
Finally, YAML isn't just a spec, it has multiple implementations. Getting consensus among the excellent contributors is a team effort, and particularly challenging when no one is getting paid for the work. Once you have a few implementations and dependent applications, you're kinda stuck in time.
It was an special pleasure for me to have had the opportunity to work with such amazing collaborators.
We did it gratis. We are so glad that so many have found it useful.