Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
YAML: Probably not so great after all (arp242.net)
60 points by calpaterson on Sept 28, 2023 | hide | past | favorite | 43 comments


Yes, it’s not perfect.

Still, totally fine for pretty much everything I’ve needed it for.

Maybe I'm just not that ambitious and haven’t hit any of the corner cases pointed out in the article.

Have been using it as a html forms definition language for 7 years - with great success.

So, yes, maybe not so great, but also not too bad.


> Still, totally fine for pretty much everything I’ve needed it for.

Try writing non-trivial multi-line bash snippets for Gitlab CI in yaml. These change your opinion very fast.


1. “ writing non-trivial multi-line bash snippets for Gitlab CI” sounds intimidating even without the “in yaml” part. I don’t want to do it.

2. “In yaml” could be a fun way to end all kinds of sentences describing challenging tasks ;)


You don't want do do it, erm, but maybe people need to do it, because, huh, it's not an unreasonable thing to be put in the CI, right?


Every CI project I've maintained in the last 10 years has dispatched the logic to a script stored in version control.


YMMV.

Other part of the world might not like too many levels of indirection, and would consider their approach a good practice.


I wrote some non-trivial multi-line bash snippets for Gitlab CI in yaml and didn't find anything annoying. Any example that gave you problem?


I would suggest never doing that, because it makes it impossible, or at least very difficult, to:

1. Lint the code using ShellCheck, which IMO is essential for any shell

2. Run and test that code locally, or anywhere outside of the CI tool

It’s only marginally less convenient to externalise the code into a .sh file and load it in CI.


I have to look them up every. time. but YAML has a sigil for multi-line strings and lets you choose between strip/preserve newlines/leading whitespace.

I still don't like YAML though


because a sane people would delegate the task to a separated script instead.


One big issue is that if a YAML file gets truncated for some reason, parsers generally don't complain.


Never thought about that, but indeed there is no termination string. I guess the same is true for python


Isn’t that true for many or maybe a majority of formats?


Not for JSON or XML which both require closing any opened bracket/quote or tag.


> You’ll need to scroll up, but then you need to keep track of the indentation, which can be pretty hard even with indentation guides,

Or you can collapse all the groups in your text editor and usually easily see what level you're at


Whitespace significance is awful. After some time questioning why I have fallen out of love with python, I realized that is part of the answer.


The problem is that people use spaces and not TABs here.

The TABualator key was invented for this purpose, in the age of the typewriter, to make things visually stand out by grouping it in the same column of a table. So all the lines that belong together are listed with the same indent, (originally all items belonging in the same column being listed under the same heading).

Example

    user:
      given_name: something
      family_name: somenting
Using a somewhat randomly number of spaces is just silly, the two name lines is one hierarcical level below "user" so add one tab char to represent that fact. Then you can vary the visual representation in your editor to your liking.

json is created to make it easy for machines to parse, not humans. Leave it to the machines.

Actually: one of the reasons TABis misunderstood is that cheap typewriters didn't have adjustable tabulator positions, and when computers came along it was often implemented with fixed columns (typically every 8), so people think of TAB as 8 spaces, which often is not what you want.


There's a huge difference in desirability for white space significance depending on whether or not it is used for code or data storage. Because the best practice is to decompose your code in functions and classes, space significance isn't so much a problem in Python. YAML on the other hand, is a visually atrocious way to store tree-like/multi-layered data.


I somewhat agree in that it does feel like more of a problem in YAML than code, and infinitely more of a problem when doing both (template code generating yaml)

But I still don't think there's any desirability for whitespace significance in code. Is there any problem it solves that isn't better solved by having enough structure and standardized tooling for autoformatting like go fmt?


One of the nice things with Python compared to, say C, is that what you see visually is what you get. I.e. if you use TAB and not spaces to indent lines or are very careful about how many spaces you put in, but then you lose the ability to quickly change the visual representation.

In C you actually use indentation when you write code to make it readable for humans, but the machine doesn't care, so there is a disconnect between the visual and how it is parsed.

You have to both use indentation add a lot of curly-braces to your code. In Pyhon you don't have this double-work.


This is more of a tooling problem than a language problem. I haven't thought about Python whitespace in ages (i.e. since I switched to modern code editing tools).


Yep, whitespace should be used to make things "pretty", and some kind of 'tidy' program should be able to do it automatically. Counting the spaces and/or tabs and/or not even knowing which one it is, really sucks. Some editors help a bit with that, but it still sucks.


TABs should be used for indentation, making it line up pretty.

SPace is for separating words.

Modern editors (and old ones too) should be able to show what is a TAB, so there is no reason to be confused.

The problem is that people think of TABs as a certain number of spaces, as if an indentation must be a certain number of spaces. In Python or Yaml etc, the logical construct is to have "one indentation level" for things like the statemants that belong to an if, or whatever. Requiring people to put in a somewhat randomly chosen number of spaces is just silly. It is "one level", so the logical choise would have been to put in one symbol to designate this fact, then use the settings of the editor to show it as you want (how many "spaces" to move), and this can change depending on personal choice, the monitor you use, etc.


For simple, straightforward configs, it's probably better to use TOML. Anything more complicated can be shunted to XML. Just define a spec (schema) for it and you're good to go. Especially if that is going to be used as a data exchange for other applications or uses.


90% of the people using YAML are not using 90% of the features in the spec. Somebody should make a subset of YAML with just the essentials while still remaining compatible. Maybe fix some of the type ambiguity stuff as well.


May I suggest StrictYAML?

https://hitchdev.com/strictyaml/


Strict YAML looks great! Would love to see this becoming a standard and then get implementations for other languages. Not just Python.


strictYAML also prohibits flow collections, which is used often.


After I digged into related articles on the link, it seems like the holy grail to the config files dilemma is to just hard-code desired settings in the source code. However, it also means that users (especially using binary packages) are stuck with settings from the distro.

If you truly needs user-configurable setting, maybe TOML is for you.


> it seems like the holy grail to the config files dilemma is to just hard-code desired settings in the source code

That is…not remotely the holy grail. You’re really throwing out the baby with the bathwater on that one. User-configurable settings are important for many, many applications.

In the narrow case that you’re working on, say, a monorepo for closed-source code that only runs on your own servers, maybe in source code is fine provided there’s no host-level differences that would require a config file.

But for any other circumstance…yeah you need some type of configurability.


Even with a single company closed source monorepo, sometimes it’s useful to change settings without a restart and redeploy. And sometimes one runs more than one instance of the same code.


I wrote a "common-config" module for an internal python project, that works like this:

1. You hardcode settings in python using a multiline TOML string

2. Users may have multiple configs at the usual config folders

3. The configs override the hardcoded defaults wherever fields are present

4. Configs override each other in order of directories (when we are talking about xdg-directories) and within them in alphabetical order wherever fields are present

This allows you to have good defaults, override some fields at will and add something like 99-debug.toml that you can just rename to 99-debug.toml.invalid if you don't need it anymore. The module also provides an install/edit cli and functions for printing the final resulting config and which configs it read on the way to that final config. At some point I have to publish it.

If you need anything more than TOML because your use case is so crazy, then consider either writing a domain specific language or use a scripting language like lua, python, js, ...


Why not both? You can write config-looking Kotlin code so that user won't even realize it's Kotlin. But type-safe and with auto-completion.

Something like

    server {
      port = 8080
      addr = "127.0.0.1"
      development = true
    }
    
    db {
      url = $DB_URL
      user = "app"
      password = include(".secrets/dbpass.txt")
    }


Not sure if I would call hardcoded settings in the sourcecode configuration - they are just constants. The primary benefit of configuration is the ability of changing the behavior of software without having to rebuild, redeploy, redistribute or even restart. Either by the user or by the developer or sysadmin, depending on context.


Everyone hates yaml and I mostly agree. Should we start using something else? At least in the context of CI, I find .INI a little lacking, and TOML has similar gotcha's as yaml. At the very least its single-line restrictions frustrating for readability. That might be as subset of TOML though, not sure.

Off the top of my head there's StrictYAML, which likely solves most of the issues, but I'd be interested in a pseudo-language similar to Terraform HCL. It seems config-files-as-scripts are here to stay in the CI world, so why not make config files more scriptable?

Then theres dumb yaml calling smart bash or Makefiles, which has some benefits as well. But I don't see people wanting to split smallish configuration over multiple files. Bash scripts are harder to take in at a glance.


I cannot agree to the statement "Everyone hates yaml".

If that was true, it wouldn't have been used as much as it is.


At this point for my own stuff I just use Lua by default. Just Lua, no stdlib.

If I ever want something that is just literals and I'm sure I won't need conditionals or variables or fancy stuff, then I'd probably go for scfg[1].

[1]: https://git.sr.ht/~emersion/scfg


Off topic but the weird ligature in "especially" really threw me off. Strange choice.


Isn't this just a dupe of this story from yesterday?

That's a Lot of YAML - https://news.ycombinator.com/item?id=37687060 - Sep 2023 (457 comments)


Significant whitespace is always a mistake.


Inconsistent formatting is a worse mistake, because people actually do read whitespace as structure, even when you wish they wouldn't.


I think we should use X12, personally.


I wrote an not very peformant parser for X12 once, if you mean the medical data format.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: