> Config-lint is a promising framework that lets you write custom checks for Kubernetes YAML manifests using a YAML DSL. But what if you want to express more complex logic and checks? Isn't YAML too limiting for that? What if you could express those checks with a real programming language?
Having recently worked a little bit with YAML for Kubernetes and HCL for Terraform, I really wish they had both just used "a real programming language" right from the start. I'll choose Racket because I know it best, but there are probably many languages that would work well. You could expose very nearly the same configuration language, but backed by a real programming language. I bet this would make some of the tools the author lists at the end (eg copper, config-lint) much easier to write, or perhaps not necessary at all.
And the author didn't mention Helm, but I will. The part of Helm I saw seemed to be a lot of work just to add "functions with parameters" to Kubernetes YAML, something we could have had for free using "a real programming language" from the start: https://helm.sh/docs/chart_template_guide/functions_and_pipe...
Why are so few configuration languages not backed by a real language?
> Why are so few configuration languages not backed by a real language?
In many cases, not having a full featured language is helpful as you have some additional guarantees that comes with a non Turing complete language like guaranteed completion.
not OP but: tags (custom data types), more builtin data types (dates etc), more syntax variations (heredocuments, multiple ways to write a string,a boolean, etc), comments, references+aliases.
The language is vast, and most people use it without knowing it.
I love YAML, but I wish there was a "strict&sane yaml subset".
All those are good examples of yaml complexity but none of that is anywhere near what even a simple programming language can unleash (variables, loops, functions, recursion etc)
Cool, I hadn't seen CDK for Terraform before. But doesn't this support my point? If they had just used Typescript from the start, they wouldn't have to add Typescript support later.
I suppose guaranteed completion matters if you are running untrusted code, but wouldn't sandboxing solve that? Are there any other guarantees that sandboxing wouldn't solve?
> I suppose guaranteed completion matters if you are running untrusted code, but wouldn't sandboxing solve that? Are there any other guarantees that sandboxing wouldn't solve?
There is more benefits than just that, by restricting the possibilities you know there won't be unbounded loops, analysis and code review is easier (and infrastructure teams are often seriously lagging in this regard), it can be easier to maintain, update and test.
In some cases, you can have a project where this is seriously limiting though because you have some very complex and specific thing you need to express. For this you can use CDK. I would say both approach are complementary, not exclusive.
In my experience I would say nearly all infrastructure projects can be expressed as Terraform rather easily, but YMMV.
Can a config mechanism where shell commands are embedded as strings still be considered as offering these additional guarantees?
Perhaps certain things are easier: it's easier to parse, to isolate the code portions, and to read data portions. But then validating what's inside those code portions is a challenge.
It seems like there's an opportunity for a programming language purpose built for the task of configuration. Its chief feature would be the ability to provide a lot of information when read in "inert data" mode, yet also provide full programming language power when read in "run" mode.
ETA: Perhaps embed Python inside something similar to YAML, and support active code via a "lambda" type. Use indentation for delimiting.
> Why are so few configuration languages not backed by a real language?
There are tons of tools that work with YAML/JSON but I imagine you mean more like a first-class citizen programming language specific or good at writing static configuration. The other side of the coin is that configuration in many cases have a different audience than programmers (for ex, end users or operations team) and it's a good option to have static configuration key=value than any human can (more or less) read easily without having to run programming code in your head. Plus programming, besides being harder to read and share, introduces bugs. So I suppose the preference between code writing configuration and stand-alone configuration depends on who the consumer is and how complex the configuration is.
Every time you have to use another configuration language you have to learn all the quirks (ex: "" == undef in puppet), Give up all your powerful tools (like a debugger) and learn new abstractions. It's extremely counterproductive and it usually would have been better to have simple data structure that you generate from a program written in a traditional language (which the team can artificially restrict to avoid recursion if they want.)
Also IMO there's no such thing as a "declarative language." These are just languages where almost everything has a side affect of mutating a data structure that you don't have an easy way of inspecting or debugging.
Everything mentioned in the article can be simply done by writing a kubernetes validation webhook in the language of your choice. Why would you specifically need the configuration to be a real language?
I'm having some arguments with other developers (devs) on whether or not this is important. I'm gonna finally try to implement this for my own pipeline this week, hopefully.
I would much rather have devs double check/validate things locally before they edit changes.
Modifying config files by using the edit text feature in GitHub (GH), doesn't enable you to do that.
& Devs are lazy. I'm lazy. They want things easy. Me too.
So let's make it easy. Modify your CI/CD pipeline to validate YAML configs on any file changes (use GH hooks for example)
Now devs can do whatever they want - if their pre-deployment checks fail, go back and fix it!
This is a very sensible approach. One pro of having the checks automated instead of just having the developers check carefully their changes is that onboarding a new developer is easier, you will spend less time on very small and specific details and you won't forget to tell some detail.
This is a good approach because it focuses on the desired outcome ('no invalid configs get deployed'), and doesn't try to use a proxy ('you have to validate locally') to get there.
You're basically describing Sentinel for Terraform (https://www.hashicorp.com/sentinel/) or Datree for Kubernetes (https://www.datree.io). There are also a bunch of tools popping up in this space that focus on catching security issues rather than misconfigurations.
You're currently being downvoted, but I agree, YAML is kinda terrible, not sure why anyone thought Python's syntactically relevant whitespace was ideal for a config file.
Classic example:
- containerPort: 7173
name: http
I think that's an object in a list? But it's not overly clear.And if I indented any of those lines wrong...
that was changed in YAML 1.2[0], which I think is many years old, but people don't specify a version on config files anyway and I'm not even sure most parsers respect it, so it keeps popping up, which is sad.
That's a map as a list element, iirc the terminology. But check out this.
When studying the Yaml spec I discovered that a map property (key: value) can have not only a string as its key, but any value. Even a list. (cue screams)
also, keywords of the underlying software mechanisms are decoupled, so uncheckable for plain YAML tooling. Whereas with XML you can at least infer a lot about the desired structure and keywords from the schema. Deeper checks are only possible with "real" programming languages, preferrably statically typed ones. I'm wondering when that wisdom trickles down to configuration languages.
Dhall is an example of a configuration language. Its programs must terminate. It aims for safety, claiming that it can support safe evaluation of untrusted code.
https://dhall-lang.org/
TBH, I have no experience with it. But, it sounds like if you need a configuration language with programmatic features, it would be more suited to the job than a general purpose programming language.
For me, xml is just too much, and when working with C or C++ (not t sure about rust) it's just a pain. Can your json schema file not serve as the documentation?
If something like k8s implemented comments as specific comment fields that would actually be pretty useful. The fields could be parsed and show in GUIs.
That is the workaround, just add it as "comment":"comment text". Having one comment for each field in an object would get unwieldy though. Anther place for them is possibly in the json schema file.
I recently had a bug that wouldn't have happened if I'd had these in place: https://dev.to/darklang/a-fun-bug-55cl. I added similar checks (and kube-score and polaris seem like good tools - I might try adding them).
I think that this is a great approach to test out the files. Mistakes in those files can cause a production outage. I like doing those tests once a PR is open and before it is merged into master and executed on the production cluster. (Disclaimer i am a co-founder of datree.io)
I find Intellij's K8s plugin really helpful for identifying issues within a single K8s YAML file, but it won't find things like a deployment.yaml without a pdb.yaml but it's a good start.
Having recently worked a little bit with YAML for Kubernetes and HCL for Terraform, I really wish they had both just used "a real programming language" right from the start. I'll choose Racket because I know it best, but there are probably many languages that would work well. You could expose very nearly the same configuration language, but backed by a real programming language. I bet this would make some of the tools the author lists at the end (eg copper, config-lint) much easier to write, or perhaps not necessary at all.
And the author didn't mention Helm, but I will. The part of Helm I saw seemed to be a lot of work just to add "functions with parameters" to Kubernetes YAML, something we could have had for free using "a real programming language" from the start: https://helm.sh/docs/chart_template_guide/functions_and_pipe...
Why are so few configuration languages not backed by a real language?