Any notion of state that satisfies requirements like > Just dump state and later...

0xbadcafebee · 2025-04-23T10:29:09 1745404149

It depends what you're talking about; Terraform specifically has a flawed model where it assumes nothing in the world exists that it didn't create itself. Other configuration management tools don't assume that; they assume that you just want an item to exist; if it does exist, great, if it doesn't exist, you create it. But for a moment I'll assume you're talking about the other problem with configuration management tools, which is "which of the existing resources do I actually want to exist or modify?"

That's a solved problem. Anything that you use on a computer that controls a resource, can uniquely identify said resource, through either a key or composite key. This has to be the case, otherwise you could create things that you could never find again :) (Even if you created an array of things with no name, since it exists as an item in a list, the list index is its unique identifier)

Taking Terraform as example again, the provider has code in it that specifies what the unique identifier is, per-resource. It might be a single key (like 'id', 'ASN', 'Name', etc) or a composite key ( {'id' + 'VPC' + 'Region'} ).

If the code you've dumped does not have the unique identifier for some reason, then the provider has to make a decision: either try to look up existing resources that match what you've provided and assume the closest one is the right one, or error out that the unique identifier is missing. Usually the unique identifier is not hard to look up in the first place (yours has a composite identifier: {VM:"foo1", Network:"mynet1"}). But it's also (usually) not fool-proof.

Imagine a filesystem. You actually have two unique identifiers: the fully-qualified file path, and the inode number. The inode number is the actual unique identifier in the filesystem, but we don't tend to reference it, as 1) it's not that easy to remember/recognize an inode number, 2) it can be recycled for another file, 3) it'll change across filesystems. We instead reference the file path. But file paths are subtly complex: we have sym-links, hard-links and bind-mounts, so two different paths can actually lead to the same file, or different files! On top of that, you can remove the file and then create an identically-named file. Even if the file had identical contents, removing it and creating a new one is technically a whole new resource, and has impact on the system (permissions may be different, open filehandles to deleted files are a thing, etc).

So what all of us do, all day, every day, is lie to ourselves. We pretend we can recognize files, that we have a unique identifier for them. But actually we don't. What we do is use a composite index and guess. We say, "well it looks like the right file, because it's in the right file path, with the right size, and right name, and right permissions, and (maybe) has the right inode". But actually there's no way to know for sure it's the same file we expect. We just hope it is. If it looks good enough, we go with it.

So that's how you automate managing resources. For each type of resource, you use whatever you can as a unique (or composite) identifier, guesstimate, and prompt the user if it's impossible to get a good enough guess. Because that's how humans do it anyway.

kiitos · 2025-04-23T14:28:19 1745418499

> Terraform specifically has a flawed model where it assumes nothing in the world exists that it didn't create itself.

I don't think this is accurate. Terraform operates against a state snapshot, which is usually local but can also be remote. But it has several mechanisms to update that state, based on the current status of any/all defined resources, see e.g. `terraform refresh` (https://developer.hashicorp.com/terraform/cli/commands/refre...) -- and there are other, similar, commands.

> But for a moment I'll assume you're talking about the other problem with configuration management tools, which is "which of the existing resources do I actually want to exist or modify?"

I'm not really talking about that specific thing, no. That problem is one of uncountably many other similar sub-problems that configuration management tools are designed to address. And, for what it's worth, it's not a particularly interesting or difficult problem to solve, among all problems in the space.

If you have a desired state X, and an actual state Y, then you just diff X and Y to figure out the operations you need to apply to Y in order to make it end up like X. Terraform does this in `terraform plan` via a 3-way reconciliation merge/diff. Pretty straightforward.

> you just want an item to exist; if it does exist, great, if it doesn't exist, you create it

It's not as simple as whether or not an item should exist. Being able to uniquely identify a resource is step one for sure. But a single resource, with a stable identifier, can have different properties. The entire resource definition -- identifier, properties, and everything else -- is what you type and save and commit and push and ultimately declare as the thing you want to be true (X). That's not code, it's state (definitions). Code is what's executed to diff that declarative state (X) against actual state (Y) to produce a set of delta operations. Or, it's those delta operations themselves.

> If the code you've dumped does not have the unique identifier for some reason, then the provider has to make a decision: either try to look up existing resources that match what you've provided and assume the closest one is the right one...

First, you "dump" state, not code. More importantly, no configuration management system would ever take one identifier and "guesstimate" that it should match a different identifier, because it's "close", whatever that means.

> or error out that the unique identifier is missing. Usually the unique identifier is not hard to look up in the first place (yours has a composite identifier: {VM:"foo1", Network:"mynet1"}). But it's also (usually) not fool-proof.

I really don't understand what you mean, here, nor do I understand your mental model of these systems. It's certainly not the case that my example VM has the composite identifier {vm:foo1 network:mynet1}. The identifier is, intuitively, just foo1. Even if we were to say the identifier were an object, the object you propose is missing the memory size. But more importantly, changing the foo1 VM from network:mynet1 to network:othernet2 probably should not have the effect of destroying the existing VM, and re-provisioning a brand new VM with the new network. Sometimes configuration changes require this kind of full teardown/spinup, but these conditions are generally rare, and all modern configuration management tools avoid this kind of destructive work whenever possible and most of the time.

> So that's how you automate managing resources. For each type of resource, you use whatever you can as a unique (or composite) identifier, guesstimate, and prompt the user if it's impossible to get a good enough guess. Because that's how humans do it anyway.

Just to reiterate, I'm not aware of any configuration management tool that "guesstimates" when making changes in this way. For good reason.

0xbadcafebee · 2025-04-23T14:56:53 1745420213

`terraform refresh` (which is now `terraform apply -refresh-only`) is an exception to the rule. Terraform doesn't know what's going on in the outside world. If you write configuration to create a Security Group named "foobar", and do a `terraform plan`, it will say it's about to create "foobar". When you go to apply, it will error out, saying "foobar already exists".

If Terraform wasn't completely idiotic, it could have just checked if it existed in the planning stage. If Terraform was even mildly helpful, it would have suggested to the user at either plan or apply time that the security group already exists, and do you want to manage that with your code? But it doesn't do those things, because it's a completely dumb-ass design.

> I'm not aware of any configuration management tool that "guesstimates" when making changes. Thank God.

Many of them do. Ansible does, Puppet does, Terraform does. They have to, for the same reason as my filesystem example: it's often impossible to know that a resource is unique, because there aren't actually unique identifiers. My definition of "Guesstimation" is specifically "using the identifiers you have available to select an entry from a list of potential options with the closest match". Ansible does this all the time. Puppet and Terraform do this for every provider that doesn't have a totally unique identifier (there basically are no totally unique identifiers, as I pointed out in my filesystem example)

kiitos · 2025-04-23T21:24:12 1745443452

Wow you really hate Terraform!

It seems to me that your frustration with Terraform being "completely idiotic" is ultimately frustration with the underlying design model.

> If you write configuration to create a Security Group named "foobar",

That configuration is a declaration: a security group named "foobar" should exist, with the declared properties.

> and do a `terraform plan`, it will say it's about to create "foobar".

That plan would be based on the most recent snapshot of the target "outside world" resources, which, if you haven't synced them recently (or at all) would probably be empty, resulting in `terraform plan` proposing to create foobar afresh.

> When you go to apply, it will error out, saying "foobar already exists"

Sure, which should hopefully make sense. You've declared a resource locally, and asked Terraform to "make it so" basically. But that resource is in conflict with an identical remote resource. You can `terraform refresh` or sync or whatever, to pull down the current relevant remote resource state locally, and then operate from there. Or you can manually blow away the remote foobar and retry. Or etc.

But this kind of situation is not common. Terraform assumes and expects that the declarations (and state) it has access to locally is an authoritative source of truth for what the target remote system(s) should be. The config files define what should be running in AWS, not the other way around.

It's fine if this isn't a fit for your use cases, but I don't think that means the entire tool is stupid or whatever. It just means it's not for you.