Building your own version of something is surely self indulgent wheel reinventing, but that’s what I’m currently doing with distributed configuration management.
It’s certainly been helpful in terms of understanding the boundaries between parts of the system, as this post also describes. The desire to auto configure everything is strong — one day you’ll have a VLAN hard coded into the config, but the next day you’ll be trying to programmatically distribute VLAN ids based on function instead. The day after that VLANs themselves are a artifact generated from a higher level separation in your human readable config. What was once a list of hosts with an attached VLAN id is now a group of hosts with a declared function that just happens to be programmatically assigned a VLAN id, but only as an implementation detail.
The same happens with IP address management — your root configuration moves closer and closer to being a document describing what you want to do, and less about how to go about doing it (which is implemented in your custom augmentations to the engine instead.)
When you can justify it as an exercise in understanding a system, and you have time for it, building your own tool chain is incredibly rewarding.
> We could imagine resolving this tension if Terraform had two different convergence engines...The “create a new environment” engine, which always creates from scratch every resource it was given. This would excel at spinning up fresh environments as quickly as possible, since it would have to perform a minimum of introspection or logic and would just issue a series of “Create()” calls.
This just doesn't make sense; introspection usually allows you apply changes more quickly. For example, it takes seconds to describe and update an existing AWS ELB; it takes minutes to delete and create a new one.
If you really want to forgo analysis and reuse of existing infrastructure, just do
terraform destroy
terraform apply
> Importantly, however, it by design will never issue a destructive operation, and will error out on changes that cannot be executed non-disruptively.
The notion of a "destructive operation" is not clear cut. Is it destructive to remove a file from S3? To update a file in S3? To delete a tag on an S3 bucket? To update a tag on an S3 bucket?
You can just manage this with permissions; that way you can specify exactly what is and isn't an allowable operation. In fact, this is best practice as it protects against bugs or misuse of the tool. Since Terraform already defaults to non-destructive, adding infrastructure-level permissions would cause it to work exactly as described.
A better example of customizable convergence would be the lifecycle management options Terraform already has, such as create_before_destroy which ensures the new resource exists before the old one is deleted.
It runs as a distributed system, and is reactive to events, both in the engine and in the language (a FRP DSL) which allows you to build really fast, cool, closed-loop systems.
Exactly, and structural and functional constraints over properties and rules.
I understand the need to reinvent the wheel, but most of these efforts feel to me like customizations that most declarative languages can provide, albeit possibly in a non-intuitive syntax.
He's spot on about separating "configuration generation" from convergence. There is no reason for the two to be the same system, the same tool. As he says, Kubernetes is only concerned with the latter, whereas Puppet, Chef, and Terraform conflate the two (insofar as it uses HCL).
And for all the talk of "declarative", there is no reason why the configuration generation stage cannot be imperative, a la Pulumi. It is the desired end state - the catalog that's being generated - that is declarative.
I mostly agree, with the caveat that in my experience, if the configuration generation stage is entirely imperative it is harder to reason about it. That might not be a problem for low-complexity setups, but can get quite important (and bad) in some more involved cases.
My experience too. I strongly believe there is room for some kind of tool to help with this process, whether it be a library, DSL, or framework. Something lightweight that places some order on the problem of generating configuration, nothing more.
Otherwise writing raw python and dumping to JSON (or using python client libraries for whatever you're targeting, e.g. kubernetes), quickly becomes an unmaintainable mess.
Yes, agreed. I don't think Nix nor Guix are there yet, in terms of usability (not that most current alternatives are much better, mind). But I could see a wrapping layer on top of either of them working quite well. It's difficult to come up with abstractions for the kind of complexities we're dealing with nowadays. I'm hopeful someone will eventually, though...
The "pluggable convergence engines" is what we've built in Gyro[1] for this very reason. We wanted to have more control over how changes are made in production.
An example is doing blue/green deployments where you want to build a new web/application layer, pause to validate it (or run some external validation), then switch to that layer and deleted the old layer. All while having the ability to quickly roll back at any stage. In Gyro, we allow for this with workflows[2].
There are many other areas we allow to be extended. The language itself can be extended with directives[3]. In fact, some of the core features like loops[4] and conditionals are just that, extensions.
It's also possible to implement the articles concept of "non-destructive prod" by implementing a plugin that hooks into the convergence engines (we call it the diff engine) events and prevents deletions[5].
We envision folks using all these extension points to do creative things. For example, it's possible to write a directive such as "@protect: true" that can be applied to any resource and would prevent it from ever being destroyed using the extension points described above.
That is why immutable infra becomes popular. You could easily destroy and rebuild the whole thing.
And for Prod env, what are discussing sounds like update behavior for me. In cloudformation, you could choose the different update policies.
Comparing with each cloud's provisioning engine, cloudformation/gcloud deployment manager/azure resource manager, terraform is lacking a lot of features. So unless you are dealing with a private cloud, using cloud default provisioning service is a no-brainer.
Cloud formation is after lacking support for resources that are new or less popular. Terraform is much better in this, supports most of the resources from the start as far as I know
In cloudformation, you could customize your resource types with aws lambda. It is like creating the providers in terraform.
Plus, cloudformation is free and managed service. You do not need to maintain and it is wrong, you could yell at AWS. Unless you bought terraform enterprise, it is still a pain to maintain another possible failure point in your system.
My experience is that the production operations engine is hard to get right because your target environment can drift from the desired configurations for reasons that you did not anticipate.
It’s certainly been helpful in terms of understanding the boundaries between parts of the system, as this post also describes. The desire to auto configure everything is strong — one day you’ll have a VLAN hard coded into the config, but the next day you’ll be trying to programmatically distribute VLAN ids based on function instead. The day after that VLANs themselves are a artifact generated from a higher level separation in your human readable config. What was once a list of hosts with an attached VLAN id is now a group of hosts with a declared function that just happens to be programmatically assigned a VLAN id, but only as an implementation detail.
The same happens with IP address management — your root configuration moves closer and closer to being a document describing what you want to do, and less about how to go about doing it (which is implemented in your custom augmentations to the engine instead.)
When you can justify it as an exercise in understanding a system, and you have time for it, building your own tool chain is incredibly rewarding.