These sorts of posts are fascinating "nerd snipes" to cryptids like me. On the surface, they look incredibly interesting and I want to learn more! Terraform isn't code? Please explain to me why not, you have my attention.
Then I get to the real meat of the issue, which is often along the lines of, "I'm a software developer who has to handle my own infrastructure and I hate it, because infrastructure doesn't behave like software." Which, fair! That is a fair critique! Infrastructure does not behave like software, and that's intentional!
It's almost certainly because I come from the Enterprise Tech world rather than Software Dev world, where the default state of infrastructure is permanent and mutable, forever. Modern devs, who (rightly!) like immutable containers and block storage and build tools to support these deployments by default, just don't get why the Enterprise tech stack is so much more different, and weird, and...crufty compared to their nifty and efficient CI/CD pipeline, just like I cannot fully appreciate the point of such a pipeline when I'm basically deploying bespoke machines for internal teams on the regular because politics dictates customer service over enterprise efficiency. It's the difference between building an assembly line for Corollas and Camrys (DevOps), and building a Rolls-Royce Phantom to spec for a VIP client (BizTech). That's not to say there hasn't been immense pressure to transform the latter into more like the former, and I've been part of some of those buildouts and transitions in my career (with some admittedly excellent benefits - Showback! Tenancy! Lifecycles!), but these gripes about Terraform are admittedly lost on me, because I'll never really encounter them.
And if I did, I don't need to pickup programming to fix it necessarily. I just need to improve my existing system integrations so Ansible runbooks can handle the necessary automation for me.
Thanks for posting this, I favorited it - having carved out a weird niche in my career as an "infra" guy, inevitably I deal with a lot of IAC. I run into this attitude a lot by devs - they are indeed annoyed by managing infrastructure, because it innately is not like software! I know I'm reiterating what you said but it is so important to understand this.
Here is a thing I run into a lot:
"Our infra is brittle and becoming a chore to manage, and is becoming a huge risk. We need IAC!" (At this point, I don't think it's a bad idea to reach for this)
But then -
"We need to manage all our IAC practices like dev ones, because this is code, so we will use software engineering practices!"
Now I don't entirely disagree with the above statement, but I have caveats. I try to treat my IAC like "software" as much as I can, but as you pointed out, this can break down. Example: managing large terraform repositories that touch tons of things across an organization can become a real pain with managing state + automation + normal CI/CD practices. I can push a terraform PR, get approved, but I won't actually know whether what I did was valid until you try to push it live. As opposed to software, where you can be reasonably confident that the code is going to mostly work how you intend before you deploy it. Often in infra, the only way to know is to try/apply it. Rollback procedures are entirely different, etc.
It also breaks down as others have noted trying to use terraform to manage dynamic resources that aren't supposed to be immutable (like Kubernetes). I still do it, but it's loaded with foot guns I wouldn't recommend to someone that hasn't spent years doing this kind of thing.
> I can push a terraform PR, get approved, but I won't actually know whether what I did was valid until you try to push it live
Our concession to this risk was that once a merge request was approved, the automation was free to to run the apply pipeline step, leaving open the very likely possibility that TF shit itself. However, since it wasn't actually merged yet, push fixes until TF stopped shitting itself
I'm cognizant that solution doesn't "scale," in that if you have a high throughput repo those merge requests will almost certainly clash, but it worked for us because it meant less merge request overhead (context switching). It also, obviously, leveraged the "new pushes revoke merge request approval" which I feel is good hygiene but some places are "once approved, always approved"
>It's almost certainly because I come from the Enterprise Tech world rather than Software Dev world, where the default state of infrastructure is permanent and mutable, forever. Modern devs, who (rightly!) like immutable containers and block storage and build tools to support these deployments by default, just don't get why the Enterprise tech stack is so much more different
This is generally true, but the interesting thing about Terraform is it was created specifically to work in the world of "immutable by default." This is why Terraform automatically creates and destroys instead of mutating in many (most?) cases, shys away from using provisioners to mutate resources after creation, etc.
Yep, and that's why I only very recently picked it up in Enterprise world, where the AWS team used it to deploy resources. What used to take them ~45min by hand using prebuilt AMIs, now takes ~500 lines of Terraform "code" and several hours of troubleshooting every time Terraform (or whatever fork they're now using post-Hashicorp) updates/changes, because Enterprise architecture is mutable by default and cannot simply be torn down and replaced.
>What used to take them ~45min by hand using prebuilt AMIs, now takes ~500 lines of Terraform "code" and several hours of troubleshooting every time
This is just operational immaturity. No one should be building anything "by hand," everything should be automated. Deploying instances from prebuilt AMIs takes a dozen or so lines of Terraform code. Terraform can spin up dozens of instances in less than 5 minutes with a dozen lines of code: https://dev.to/bennyfmo_237/deploying-basic-infrastructure-o...
If you're not operationally mature enough, the problem isn't the tool, it's you. This is basic Terraform usage.
>because Enterprise architecture is mutable by default and cannot simply be torn down and replaced.
This is no longer correct/true. Maybe for laggards it's true, but modern enterprises with modern ops teams using modern tooling are deploying most of everything with immutability in mind. Enterprise architecture is immutable by default now, and destroying and replacing is the norm.
> Enterprise architecture is immutable by default now, and destroying and replacing is the norm.
real life is harder. If I have a cluster of 8 H200 machines running training I can't really destroy it and redeploy. Technically I can but I need to spend time with the data scientists to make sure they configured everything to continue training from checkpoints. And if this cluster is idle for a day the amount of money wasted is around my monthly salary..
hm, maybe more enterprisey clusters are used in a such a way that any node can be replaced at any time.
And this gets into another complication of ET that doesn't happen with PT: with Product Tech, the onus is on the customers to modernize around a new update, whereas with ET, it's our responsibility to work around the customers, on their schedule, and their timeline, unless we want to be fired for "bad customer service".
We cannot simply rip and tear like Product can, placing trust in your orchestrators to rebuild from configs with brand new instances. We can't spool up Chaos Monkey and test-tank the ERP system, because the ERP team has no interest (or political benefit) in modernizing their infrastructure to support Configuration Management tools or pipelines.
> Deploying instances from prebuilt AMIs takes a dozen or so lines of Terraform code. Terraform can spin up dozens of instances in less than 5 minutes with a dozen lines of code
That's ignoring everything that goes into even deciding the "nitty-gritty" around the deployment, which is where the bulk of the code comes from. What security keys does the customer use? Do we use ASGs or one-offs? Is the underlying application fault tolerant or not? Does the customer require backups? What subnet does it go in? What security groups need to be added? What are the tags? Is it region-specific? Does it belong in a higher security zone? Does it need specific failover criteria?
500 lines later, you can deploy one VM with everything needed to meet the customer and organizational demands. That's not efficient, but that's how enterprise technology ultimately works.
> Maybe for laggards it's true, but modern enterprises with modern ops teams using modern tooling are deploying most of everything with immutability in mind. Enterprise architecture is immutable by default now, and destroying and replacing is the norm.
So throwing insults isn't exactly helping here, because I'm literally coming from said modern ops teams, using said modern tooling, from a large enterprise. You can apply a universal standard to "all enterprise" all you want, but the cruel reality is that most Enterprise technology does not work in the way you are describing. ERP servers remain mutable, database clusters are mutable, Physical Security appliances are mutable, hypervisor ops appliances are mutable, VPN concentrators are - you guessed it - mutable. We have built the tooling to support immutable architecture, we have demonstrated its capabilities to the Enterprise, we are ready for Kubernetes and Containers both on-prem and in the cloud, but our customers and applications flatly do not use or support it.
This is something I have had to explain time and again to the Powers that Be (TM), that Enterprise Technology and Product Technology needs/pipelines/customers are vastly different, with different paces, needs, and operational goals. No amount of Terraform, Ansible, GitHub Actions, Argo Workflows, Puppet, or other pipeline add-ons are going to speed up Enterprise Technology, because the software providers do not care to do so. If your Enterprise application selection enables immutable architecture across the board, you are exceedingly lucky to have leaders who allow that to be the case, because in my experience - from small MSPs, to major publishers, to giant tech conglomerates, and everywhere in between - Enterprise Technology is mostly mutable infrastructure with old-but-custom software that will never, ever be modernized, and often with SLAs far superior than anything public customers are allowed to have.
As the username implies, the "dinosaur on the internet" kind. The classic trope of the IT person who live(d) in their windowless cave, surrounded by a cacophony of whirling fans and grinding hard drives, retired kit repurposed into a lab since the budget never allowed for a proper one. Graphic tees and blue jeans, an enigmatic mystery to the masses who complain stuff is broken but also that they don't know why I'm here since everything always works.
So just your average IT person, really. What we lack in social graces, we make up for with good humor, excellent media recommendations, and a loved passion for what we create because we like seeing our users smile at their own lives being made easier. I guess the "cryptid" part comes in because I'm actively trying to improve said sociability and round out my flaws, unlike the stereotypical portrayals of the BOFH or IT Crowd.
Then I get to the real meat of the issue, which is often along the lines of, "I'm a software developer who has to handle my own infrastructure and I hate it, because infrastructure doesn't behave like software." Which, fair! That is a fair critique! Infrastructure does not behave like software, and that's intentional!
It's almost certainly because I come from the Enterprise Tech world rather than Software Dev world, where the default state of infrastructure is permanent and mutable, forever. Modern devs, who (rightly!) like immutable containers and block storage and build tools to support these deployments by default, just don't get why the Enterprise tech stack is so much more different, and weird, and...crufty compared to their nifty and efficient CI/CD pipeline, just like I cannot fully appreciate the point of such a pipeline when I'm basically deploying bespoke machines for internal teams on the regular because politics dictates customer service over enterprise efficiency. It's the difference between building an assembly line for Corollas and Camrys (DevOps), and building a Rolls-Royce Phantom to spec for a VIP client (BizTech). That's not to say there hasn't been immense pressure to transform the latter into more like the former, and I've been part of some of those buildouts and transitions in my career (with some admittedly excellent benefits - Showback! Tenancy! Lifecycles!), but these gripes about Terraform are admittedly lost on me, because I'll never really encounter them.
And if I did, I don't need to pickup programming to fix it necessarily. I just need to improve my existing system integrations so Ansible runbooks can handle the necessary automation for me.