Cloud Infrastructure as SQL

nucatus · on Sept 30, 2021

While there is a point in representing the state of the infrastructure as SQL to enforce type safety and other constraints, I still don’t understand what are the _real_ advantages over the current tooling. All the key advantages listed there, including type safety and cloud providers defined constraints, are already well covered by the actual battle-tested tools and frameworks.

Here are some areas where I identified some red flags regarding the present approach. These are key aspects of the everyday life of an infrastructure engineer.

Usability. The SQL abstractions simply don't scale to make the human<->machine mapping work. You would need another layer of abstraction to map that infrastructure into a state that is easily readable and understandable for our brain. On the other hand, the more consecrated tools use building blocks that both map well to be translated in infrastructure state _and_ are easily understood by our brain. Add some stored procedures to the party and you’re lost in the weeds.

Versioning. What would be the easiest way to extract from the SQL storage the difference between two versions of the infrastructure? With the traditional tools, that is an easy diff which reveals changes in a matter of several keystrokes with no extra layering.

Reusability. How easy is to port infrastructure code to other cloud providers? How easy is to define resource templates?

rubiquity · on Sept 16, 2021

I see a lot of criticisms for not wanting to use SQL to do writes and I think that is misguided. The current state of your infrastructure is absolutely state and SQL is a great language for working with state. While Terraform and all these other "declarative" infrastructure tools are better than what came before them, you're ultimately playing Relation Stitcher by needing to connect the various pieces together. There is nothing declarative about Terraform and others. Infrasturcture is absolutely stateful and relational so why not use SQL and relations to manage it?

There are mentions of other tools that address the read side, and that's useful for obvious reasons, but you've punted on the hard problem which is the writes. The key to getting writes right will be constraints and triggers. Constraints can absolutely help operators to not cause outages by creating guard rails around certain state mutations. Triggers are important because unlike data that never sees an update, infrastructure is living and being able to consume those changes is important.

I might just have confirmation bias because I have this idea written down and think it should exist. Regardless, good luck!

tmp_anon_22 · on Sept 16, 2021

Infrastructure is quantum state. AWS APIs lie to you. Status pages lie to you. If I had a nickel every time the solution to a failing HTTP call was "just send it again"...

Use whatever tool you like: SQL, Rust, carrier-pigeon - but it wont be a panacea to solving cloud infrastructure.

rubiquity · on Sept 16, 2021

Everything in the real world is quantum state but that doesn’t stop every SaaS application out there from using an RDBMS as their system of record. This is why we have reconciliation processes. Terraform state files are just an ad-hoc storage format without any of the features that SQL and RDBMS have had for decades (I don’t mean to pick on terraform so much, it’s just the one I’m most familiar with).

oneplane · on Sept 17, 2021

Unless it can also do state verification, planning and the three-way: desired state, expected state and actual state in one diff, it doesn't hold a candle to Terraform.

At the same time this SQL would be useful when used in combination with a classic database migration framework which essentially comes down to a bad version of terraform.

Inversely, you could simply use de Terraform CDK and an SQL adapter to write SQL into Terraform definitions. People who want to use SQL can use that, and in the background it's still terraform with all its benefits.

stingraycharles · on Sept 16, 2021

Like someone else says, SELECTs make sense, INSERT/UPDATE/DELETE to manage infrastructure state, rather than using “proper” infrastructure as code, sounds like a path to hell to me.

slashdev · on Sept 16, 2021

Why specifically is that worse than config files?

stingraycharles · on Sept 17, 2021

config files / templates are typically declarative, you tel some orchestrator “this is how I want my infrastructure to look like, please detect changes and make it happen”.

it makes it much more easy to reason about things, as state is abstracted away.

yawaramin · on Sept 19, 2021

What's the difference in declarativeness between:

- Opening a YAML file in a text editor, deleting some lines, then doing a Kubernetes or Terraform or whatever apply operation

- Creating a SQL migration to delete some rows from a table, checking the migration into the repo, and then running a database migration

?

dspillett · on Sept 17, 2021

If properly modelled so the modifying operations are consistent and logical, SQL for modification has the advantage over "proper code to update" has the advantage that in many cases the same query to find something is used with only minor alteration to modify it.

Now the real magic, would be to make changes atomic!

verdverm · on Sept 17, 2021

TF has locks so that two people cannot modify infra at the same time.

There are also dependencies within the resources to consider, such as network creation before firewall rule before VM

yawaramin · on Sept 19, 2021

Yeah, and Terraform providers can be buggy and actually mess up those dependencies. I ran into this very recently where Terraform planned my changes successfully but failed to delete something because it had a dependency on another thing which it was planning to delete later.

It would be very difficult to mess this up with SQL foreign keys.

dnautics · on Sept 16, 2021

Everything is "quantum state". You do a database fetch and in the time that the result travels over the wire to the FE another node could have made a mutation.

The nice thing is that adapters to sql engines often have exactly the sort of protections you want baked in, e.g. "check to make sure the updated at timestamps match before committing an update"

_6pvr · on Sept 17, 2021

I'll bite.

What infrastructure management tool do _you_ prefer then?

tmp_anon_22 · on Sept 17, 2021

Every tool is flawed and picking the right one for your problem is about weighting tradeoffs, that said tools with large well-run communities, the right corporate sponsors and contributors, and enough tenure to round out the sharpest edge cases, are generally "better".

So Terraform and Ansible come to mind. Both deeply flawed, but they become the least-worst current option.

People should still strive for better, test new ideas with new tools, but choices for production code should play it safe almost every time.

_6pvr · on Sept 17, 2021

I feel like the purpose of your posts is more about bemoaning infrastructure tools and less about discussing the original post's product. I'm not really sure it's productive. I definitely do not find it productive.

orf · on Sept 16, 2021

Infrastructure is made up of many parts. A single S3 bucket may have multiple resources (a key, a policy, a notification queue). How is having several “insert into” statements for each of these any different from “resource” blocks in terraform?

If anything it would be much worse, because you either write some ungodly huge sql statement to create multiple resources or you loose the ability to know what resources depend on each other and form a graph of dependencies.

This results in much slower plans, as you don’t have a dag and you need to potentially refresh state much more often, or something that looks like terraform but with way way more boilerplate and irrelevant syntax.

rubiquity · on Sept 16, 2021

We need to be creative and make types relevant to the resources being modeled. If the infrastructure database is just a bunch of string identifier fields then it isn't very helpful. You have to find good abstractions and model the resources in types that bring meaning to the data.

orf · on Sept 16, 2021

Sure, but that’s got nothing to do with SQL and could be modelled in terraform. Or better yet, indirectly using terraform via IAC providers in languages like Typescript.

Show me a proper example of creating an actual s3 bucket that you’d use in production. KMS key, inventory configuration, resource policy, lifecycle policy, logging enabled. Created via SQL.

Now show me how you’d take this and make it a reusable module so we can have a standard template for buckets that everyone can use.

rubiquity · on Sept 16, 2021

You’re focusing too much on the initial creation rather than on going maintenance and evolution. SQL and relations are much better suited to handle evolution by enforcing constraints than a graph of stitched together pseudo JSON.

orf · on Sept 16, 2021

No, I’m not. There isn’t a difference between creating and ongoing maintenance- it’s the same thing. You describe your state, something reconciles that. How you describe your state and the dependencies between resources is absolutely key, and on the face of it it looks like this interface is totally inadequate.

So, again, show me even a brief sketch of how you would describe what I said above with this model, and you’ll see it quickly falls apart.

rubiquity · on Sept 16, 2021

You're coming off a bit worked up, but I'll humor you anyway.

I'm not going to type out a bunch of SQL as an example. SQL vs HCL isn't the point and they basically break even on expressiveness. After you've typed out your pseudo-JSON, what exactly are the existing tools saving you? From having to use some wrapper around the cloud API? That's the easy part.

By overly focusing on SQL you're missing the forest for the trees. The point is relations and RDBMS features such as constraints, triggers, stored procedures. Such a platform would be always online rather than just a tfstate file waiting for humans to munge it. It's also time to stop thinking about the cloud as literal resources like current tools do and start moving to more abstract concepts (more on that later).

> No, I’m not. There isn’t a difference between creating and ongoing maintenance- it’s the same thing.

I run very critical infrastructure for a living and there absolutely is a difference. Creating new resources is easy -- they aren't being used and the world doesn't have any expectations for their performance or reliability. The bacon is made in evolving existing infrastructure without impacting user experience negatively. Terraform and other such generators give you very little in guard rails or help and silly outages happen all of the time because of it.

Database engineers have been creating sophisticated execution engines for decades. The creator of SQLite cites treating every SQL query as its own program to be ran by a byte code VM as a key design decision. Writing Terraform or what have you is like programming in an AST (sorry Lispers). To date query execution engines figure out how to manage on-disk structures. There is no reason they couldn't be creating smart plans for infrastructure changes at a much higher abstraction level than "glue this KMS key to this bucket."

orf · on Sept 16, 2021

I think we're arguing two different points here. I'm not arguing that a theoretical better system exists that can plan infrastructure on a much more intelligent level than current public tooling does, or that you can't muck up terraform applies and cause downtime. I'm arguing that version controlling a bunch of SQL statements to describe your infrastructure feels wrong, outside of a few snappy short examples, and that this more intelligent system you're describing is in no way tied or related to using SQL as a DSL.

You have some infrastructure that you want to exist in a certain state. I used a bucket to describe something that seems simple but often needs a lot of other related infrastructure to exist that is configured via potentially a lot of different API calls. These need to be created in a particular order, and any changes to those resources need to be reconciled in a particular order. Replace "bucket" with anything else that fits that, which is absolutely everything.

Ok, so how do we describe that with SQL?

   INSERT INTO aws_kms_key VALUES (nothing?) RETURNING something?
   INSERT INTO aws_bucket (bucket_name, kms_key_id) VALUES ("bucket-name", (SELECT kms_key_arn FROM aws_kms_key WHERE what = ....?))
   INSERT INTO aws_bucket_policy (bucket_arn, policy) VALUES (???, make_policy_somehow?())

Looks terrible, not least of all because naively you'd assume it would create buckets every time this is "run" or because you'd need to add variables and suddenly you're writing TSQL. Great. Now the terraform would be quite verbose, sure, but you'd encapsulate it in a module so all you'd need is:

   module "s3_bucket" {
      name = "bucket-name"
      kms = true
   }

And that's way more descriptive than a large number of SQL statements to do the same thing. Sure, you could argue that you could write the following magic statement:

   INSERT INTO cool_custom_bucket (name, kms) VALUES ("bucket-name", true)

But how would a user create their own abstractions? What if I want a custom policy for all our buckets? Would we write some trigger that updates a policy for all buckets that are inserted? Would we write some incomprehensible stored procedure that does some of this?

Ok, but lets forget all that. Imagine it works and it's fluid and it's basically terraform but in SQL etc etc. So now we want to make some changes and roll them out. How does SQL help us here? Do we open a merge request and say "UPDATE s3_bucket WHERE name = "bucket-name" SET kms = false"? Do we edit the insert statement above? Do we just ignore versioned infrastructure entirely and open some pseudo-terminal and smash in "update production set down=true"?

Seems horrible. If you're imagining something entirely different level of infrastructure management needed only by google-scale companies that can only be done with SQL then you might need to elaborate on why exactly that is, because I can't see the connection. Why couldn't this theoretical system work with infrastructure managed by a DSL like Terraform? Why couldn't you fork Terraform to make more intelligent decisions about large-scale updates, or manage state in a different way whilst keeping the DSL?

And, as a side note, I rather appreciate the statefile being a plaintext file. It's much more flexible. But it can store state in a database as well.

yawaramin · on Sept 19, 2021

Why are Terraform modules convenient while stored procs are horrible though? They're doing the same job? E.g.

    call s3_bucket('bucket-name', true);

Is not really all that different from a Terraform module.

And the rest of your objections boil down to 'use in-code database migrations', which app developers have been doing as a best practice for a long time now.

orf · on Sept 21, 2021

There are two decoupled parts to IAC - describing the state you want and reconciling that with the actual state of the world. Reconciling can be advanced as you want: you could take into account SLAs, perform progressive updates, or just ignore that and remove/create resources at will. So even if you used SQL and inserted it into a database, it's just a representation of state that needs to be reconciled in a separate step.

The problem is that "use in-code database migrations" is very very different to this problem. With database migrations you have a previous state and you manually code the steps to progress that state: add a column, then copy the data, then delete the old column, then do this, then do that. You do the reconciliation yourself.

And this is obvious when you think about it - imagine you did structure IAC like migrations, when you wanted to bring up some fresh infrastructure what happens? You would create all these resources, then modify them, then delete them, then create some new resources as a linear history of operations. Not good, very brittle.

So you need to decouple "what you want" from "how you get it". You could indeed use SQL to describe "what you want", but what does that really give you over using a language like Python/TypeScript/HCL to do it? It's not expressive enough and even things like conditionals are hard, and lets not even talk about string templating. Lets try and model the `s3_bucket` procedure with Postgres:

   CREATE PROCEDURE s3_bucket(bucket_name, bucket)
      LANGUAGE plpgsql AS
      $$
      DECLARE
         kms_arn string;
         bucket_arn string;
      BEGIN
        INSERT INTO aws_kms_key VALUES (nothing?) RETURNING arn INTO kms_arn;
        INSERT INTO aws_bucket (bucket_name, kms_key_id) VALUES ("bucket-name", kms_arn) RETURNING arn INTO bucket_arn;
        INSERT INTO aws_bucket_policy (bucket_arn, policy) VALUES (bucket_arn, make_policy_somehow?())
      END
      $$;

Of course, because you're describing a graph these returned arn's wouldn't actually _be_ arns, just placeholders that need to be reconciled later. And therein lies a problem - how do you ensure each node in your graph has a unique order-independent identifier? How do you know that you shouldn't need to create the `aws_kms_key` because it already exists, but you do need to create the `aws_bucket_policy` because you just added that `INSERT INTO` line? So you'd need to add a unique identifier to every `INSERT` statement:

    INSERT INTO aws_kms_key (node_id) VALUES ("this has to be unique else everything goes to hell") RETURNING arn INTO kms_arn;

Ugly. What's better about this than:

    resource "aws_kms_key" key {}

    resource "aws_s3_bucket" bucket {
        name = "bucket-name"
        kms_key_id = aws_kms_key.key.arn
    }
    
    resource "aws_bucket_policy" policy {
        bucket_id = aws_s3_bucket.bucket.id
        policy = jsonencode({...})  # or use the IAM document resource
    }

You could also model it with Pulumi:

    const key = new aws.kms.Key()
    const bucket = new aws.s3.Bucket("my-bucket");
    const policy = new aws.s3.BucketPolicy(bucket, make_policy())

Both of these are _much_ easier to grok and more flexible than the stored-procedure soup you will immediately get into if you use SQL.

And at the end of the day... it doesn't matter. You're not even _using_ a database to store your state, you're just using SQL to produce some graph of things you need to reconcile. You could write some huge stored procedure to turn your database state into a terraform/pulumi plan and get it to apply it with no problems.

Given that... what's the point of using SQL when there are better alternatives? Or, if you really hate yourself, go write it in COBOL. Equally as suitable.

yawaramin · on Sept 22, 2021

> You're not even _using_ a database to store your state, you're just using SQL to produce some graph of things you need to reconcile.

Well, that's one way of doing it. The better way, imho, is to actually have a database to store the state. That also solves:

> You would create all these resources, then modify them, then delete them, then create some new resources as a linear history of operations. Not good, very brittle.

When you have an actual database backing your SQL queries, the migrations would all run in this DB, produce a final state of the DB, then this final state would get applied to bring up your system in one shot.

> So you'd need to add a unique identifier to every `INSERT` statement:

Sure, and that's fairly easy to derive, because we already have a unique identifier, the bucket name:

    INSERT INTO aws_kms_key (node_id) VALUES (concat('aws_kms_key', bucket_name)) RETURNING arn INTO kms_arn;

orf · on Sept 23, 2021

> Well, that's one way of doing it. The better way, imho, is to actually have a database to store the state

It doesn’t matter how the state is stored, you still need to reconcile it. So yeah you can write your target state into a database if you wish, but you need something else to compare that with the state of the actual world and progress that towards the state you requested.

> When you have an actual database backing your SQL queries, the migrations would all run in this DB, produce a final state of the DB, then this final state would get applied to bring up your system in one shot.

Yes, what you are describing is how Terraform works. Without SQL.

Except it’s never “one shot” because it’s a DAG of requirements where each resource creation can fail.

> Sure, and that's fairly easy to derive, because we already have a unique identifier, the bucket name

In this specific case, sure. But for a KMS key or in the general case, no you wouldn’t. You’d also be hard-coding provider specific information into your system - “a bucket ARN can be derived from the name attribute”. This is not always the case - consider any REST api that accepts a POST request to create something, but returns only an incrementing integer primary key to uniquely identify it.

chousuke · on Sept 17, 2021

SQL is great for querying (and steampipe exists just for that reason), but the point of Terraform is to be a desired state configuration system; SQL updates would remove its best feature. You could use the database as the desired state, but then it would no longer be readable text that you can version-control.

Terraform code is equivalent to the desired content of the database, and running plan generates the required update queries to make it happen.

yawaramin · on Sept 19, 2021

Database migrations are equivalent to the desired content of the database, and if I'm not mistaken, Terraform state files which are the _actual_ database, are not version controlled.

chousuke · on Oct 3, 2021

Terraform state files are sort of a workaround for the fact that it's not really feasible to discover the complete state of Terraform-managed infrastructure via the APIs only because it's not always possible to have an unambiguous mapping from declared resources to real ones. The Terraform code still represents the desired state of reality, but the state is needed to track it in practice.

tehbeard · on Sept 17, 2021

> SQL is a great language for working with state

SQL is a great* language for querying relational data..

(* Unless you need to pull relational data out in a tree like structure rather than flattened and denormalized set of rows, in which case you need to wrap a bunch of logic around it to get that type of output)

> you're ultimately playing Relation Stitcher by needing to connect the various pieces together.

I don't see how doing this in SQL does away with that.

SQL also runs in to the same problem pointed out about Ansible; You, the "meat bio-processor", have to pre-calculate and declare the load order for dependencies (make VPC, then make firewall rule tied to VPC, then make VM tied to VPC and firewall rule), you can't leave the execution order or the decision of conditional execution up to the engine (VPC with that ID found, just get the ID, don't recreate it), it's on you to make each of these decisions... That's easy enough for a toy use case, but any large production system and it'll fall apart from too much context needed.

dnautics · on Sept 16, 2021

It kind of already exists... Libvirt is basically a database under the covers. If anything I'm surprised the implementation isn't "AWS backend for libvirt" > "sql frontend for libvirt"

In a past life I was managing virtual machines in elixir and not too long after writing a libvirt adapter I realized that what I should be doing is frontending libvirt with ecto (elixirs database tool)

rubiquity · on Sept 16, 2021

That's interesting! I never thought of libvirt that way.

omegalulw · on Sept 17, 2021

I am always surprised by this irrational need to build "generators" or alternatives for SQL.

SQL, to me, seems like a very straightforward solution to the relational querying problem. If performance is a concern, optimize under the hood (SQL is just a language after all).

nathanwallace · on Sept 16, 2021

Steampipe (https://steampipe.io) is an open source CLI to query cloud infrastructure using SQL (e.g. AWS, GitHub, Slack, k8s, etc). It also has a HCL language to define security benchmarks and controls (e.g. AWS CIS, etc).

We are a Postgres FDW under the hood with Go-based plugins, so write would be possible, but we've chosen to focus on read only so far. Definitely interested to see how you approach create, update and delete in the SQL model!

Notes: Not related to iasql. I'm a lead on the Steampipe project.

yaldasoft · on Sept 17, 2021

We are using steampipe to connect 30 AWS accounts via SQL. In a single call we can query all our services. Using metabase to visualise it all. I highly recommend you give it a go

kspyy · on Sept 17, 2021

How are you making the metabase<-->steampipe connection? Are you able to connect metabase directly to a running steampipe/ postgres server instance? or are you doing some ETL into metabase. Thanks in advance (metabase is new to me, but steampipe has been really handy)

nathanwallace · on Sept 17, 2021

Use `steampipe service start` to run Steampipe in the background [1]. It will print out a standard Postgres connection string running on port 9193 of your local machine. You can then simply point Metabase (or any other DB visualization tool) at it using the standard Postgres setup.

1 - https://steampipe.io/docs/using-steampipe/service

nathanwallace · on Sept 17, 2021

Awesome! Would love to hear more about your use case and any suggestions for improvements if you want to DM me.

heinrichhartman · on Sept 16, 2021

Neat. Does that use steampipe services/API under the hood, or does it consume the AWS/GigHub APIs/... directly?

nathanwallace · on Sept 16, 2021

It talks directly to the cloud provider APIs. Plugins are written in go-lang (similar to Terraform providers).

OSS plugins - https://github.com/topics/steampipe-plugin

Docs - https://hub.steampipe.io/plugins

heinrichhartman · on Sept 16, 2021

WOOT! That's awesome.

Aaronstotle · on Sept 16, 2021

As someone who has to do lots of compliance activities, this is a fantastic tool, thanks!

nathanwallace · on Sept 16, 2021

Great to hear :-) Please check out our open source mods (e.g. AWS Compliance, GitHub Sherlock, DigitalOcean Thrifty) - we'd love your feedback & help! https://hub.steampipe.io/mods

dddw · on Sept 17, 2021

Ooh this looks really powefull, any plans to add hetzner and backblaze?

nathanwallace · on Sept 17, 2021

Both seem to have reasonable Go SDK libraries, so it should be easy to create new plugins for them. I've added them to our internal backlog, but all our plugins are open source if you'd like to try building it yourself! The Linode plugin is a simple, similar example to use as a starting point [1]. We also have docs for writing plugins [2].

1 - https://github.com/turbot/steampipe-plugin-linode 2 - https://steampipe.io/docs/develop/writing-plugins

Aaronstotle · on Sept 16, 2021

Will do! Thanks again

yevpats · on Sept 16, 2021

Interesting how it implemented under-the-hood. Does it use cloudquery (https://github.com/cloudquery/cloudquery) or steampipe (https://github.com/turbot/steampipe) under-the-hood or does it implement everything from scratch.

Disclaimer: Im the founder of CloudQuery.

I get why you would want to do select * from infra, but not sure I understand why you would want to do "insert * into infra" and not use something like terraform? interested in hearing the use-case.

bink · on Sept 16, 2021

It does sound like one of those "because it's cool" features rather than "because anyone should ever actually use this".

neonlex · on Sept 16, 2021

Which part of what?

whoomp12342 · on Sept 16, 2021

personally, I love terraform. I dont like statefiles though. Its annoying to have them in a vault system

rajamaka · on Sept 17, 2021

I'm frankly really confused that Terraform is still so widely used for AWS IaC when CDK exists.

Aeolun · on Sept 17, 2021

CDK still uses cloudformation under the hood right? Anything to do with CF gets a hard pass from me.

leetrout · on Sept 17, 2021

Credit where credit is due CF is better than it used to be. Of course just like TF doesn't rollback CF will rollback and on an initial deploy you get in this weird state where you have to completely remove your stack to try again.

But it was significantly faster than it used to be. I never want to use it tho.

mwarkentin · on Sept 17, 2021

They actually released support for retrying failed stack updates the other week: https://aws.amazon.com/blogs/aws/new-for-aws-cloudformation-...

Aeolun · on Sept 17, 2021

Well, that’s only 5 years too late to make me consider it :P

The situation where anything goes wrong, and you just sit there staring at CF being unable to delete some S3 bucket for hours, eventually going into the console manually and deleting it in one second. Has left me really sour on the whole thing.

verdverm · on Sept 17, 2021

One reason is because TF supports many clouds and people like having skills that are not vendor specific

hckr1292 · on Sept 17, 2021

Half true. The true half is that terraform will work out of the box with basically all clouds and many saas vendors. The untrue half is that terraform does nothing to abstract away the underlying vendor. You cant treat gcp VMs the same as aws EC2 instances. You do get to reuse HCL.

verdverm · on Sept 17, 2021

true, but the tooling and workflow remains the same.

Not sure of any tool that could abstract the details sufficiently to be widely adopted. There is just too much nuance in cloud config.

I'm exploring using CUE (https://cuelang.org) to define TF resources, exporting as JSON for TF. So far it's much nicer

stingraycharles · on Sept 16, 2021

What problem does this solve, as opposed to a git repositories?

To me, declarative infrastructure management is incredibly important in order to reason about a deployment. Even though under the hood, yes, everything is stateful, it’s something I want abstracted away, not as a primary way of interacting with my infrastructure.

I guess what I’m asking is what the canonical use case / target user of this is.

bink · on Sept 16, 2021

I think most large companies have written similar tools to deal with various use cases.

As part of a security team I often have the need to query for what instance has been assigned what IP, what team owns what AWS account, which security groups have port X open. You can do all of this using API queries but it's tedious, slow, and you can run the risk of hitting API rate limits. Most of this information is not easily queryable via Terraform and git.

At my last place we had a custom designed tool that would regularly fetch information from the AWS API and our CM servers and store it in a database. At my current place we have a tool that can query from CM but doesn't integrate with the AWS API so we're still doing things manually there. Having this database available via SQL is a tremendous help.

Now, the _write_ side of this I do not have a use case for.

Aeolun · on Sept 17, 2021

Just in case you don’t see it somewhere else in this thread. Someone posted something that does this: https://github.com/cloudquery/cloudquery

yawaramin · on Sept 21, 2021

Also https://steampipe.io/

gavanm · on Sept 17, 2021

I find the idea quite interesting - and it could be valuable just for the read-only aspect of it alone - especially if it is queryable through an ODBC or JDBC connection.

It helps avoid some scripting / token / API overhead - but I'm wondering what the trade off is in terms of initial setup time.

One general concern I have is around Data (state) consistency of the data after DML operations (insert / update). If the data represents the state of a resource - how do you know if there's a pending operation on it as a result of an earlier Update operation? How is idempotency handled? How are conflicting concurrent state changes handled (one user forces a restart, another initiates a shutdown)? What happens if a change applied to multiple records/resources only successfully applies on some of the resources?

This isn't specific to just this approach though - it's going to apply to any kind of cloud / resource management platform. I'm not sure how they tend to handle it - but a basic single record per resource has limitations around it that mean more work is needed. Maybe you end up with change history records?

jeffreyaven · on Sept 17, 2021

I agree with rubiquity, without getting into a philosophical debate about programming paradigms, we have started an open source project InfraQL (https://infraql.io/) which is a framework for all cloud operations (query and provisioning) using SQL, we support SELECT, INSERT and DELETE with UPDATE, UPSERT, REPLACE coming, totally different architectural approach to the other solutions referenced in this thread, supports config/data supplied via Json or Jsonnet. If we look at cloud resources being defined by configuration, and configuration as (simply) data, then a SQL dialect makes absolute sense - as opposed to continuously inventing new and complex frameworks and DSLs.

ryanisnan · on Sept 16, 2021

I can see SELECTs being useful here, but yeesh INSERT/UPDATE/DELETEs would scare the heck out of me.

Also not clear on how/why this would be a SaaS product - this seems like it's a library you would download, throw some keys at, and thanks.

dnautics · on Sept 16, 2021

> yeesh INSERT/UPDATE/DELETEs would scare the heck out of me

Out of curiosity, why would they?

Too · on Sept 17, 2021

1. Idempotency. Accidentally run an INSERT twice and you get two VMs instead of one. Sure you could be careful and do upsert-queries but is everyone in your org equally careful?

2. Reverts, in terraform it's as easy as removing the resource and re-applying. Or even just run teardown. Even if you've added other things in the meantime. With SQL if you've INSERTED something you need to come up with the corresponding DELETE query. Rollbacks might be a thing but can they be applied partially?

3. Version control and code reviews. And are you sure the code that was reviewed is what was actually ran in your terminal before you commited, or did you depend on manually inserted side-effects?

4. Dependencies between resources, this has already been covered a lot in this thread. Especially together with teardowns this becomes extra complicated.

yawaramin · on Sept 19, 2021

1. Use a unique key check that will fail if there's already a VM with the given key.

2. The OP mentioned that you can snapshot the database and restore it to a known-good state if anything goes wrong. State databases are probably not going to be humongous in size–you can imagine snapshotting them hourly for quite a long time.

3. How do you solve this problem today? By running Terraform applies locally on your machine to test out the modifications and then approving the PR?

4. That's exactly where foreign keys come in, they are exactly what enforces correct handling of dependencies between resources. E.g. some time ago one of our PagerDuty service monitors was renamed in Terraform. The plan destroyed the old monitor and created a new one with the new name, losing the linkage to the old incidents and making it look like the service had no incidents. Now imagine modelling this relationship in IaSQL and failing to destroy the old monitor because of the incidents.

dnautics · on Sept 17, 2021

I guess the point though is all of these things are problems anyways with a programmatic interface like boto. SQL has battle-tested techniques extracted to composable common libraries that will build these safeties on top of SQL itself. So if there's an expectation that cloudSQL is a replacement for something like kubernetes, nomad, rancher, I think that's not quite the right level of abstraction -- as I see it cloudSQL is a replacement for boto.

cryptonector · on Sept 16, 2021

I guess mass administration -> one mistake can destroy everything.

zmmmmm · on Sept 17, 2021

Seems to me whether you like this or not probably boils down to whether you like SQL or not.

For someone who spends most of their time writing type-safe functional code and loving the expressive power of it, the idea of voluntarily injecting SQL into my life seems horribly retrograde. We write whole frameworks to try and avoid having to manually code SQL(!) and every time I have to do it I sigh in frustration at how frustrating, verbose and repetitive it is.

But I'm guessing there's a whole crowd of people out there thinking "finally I can just use SQL and none of these ridiculous programming languages to manage my infra ..."

rswail · on Sept 17, 2021

Instead of focusing on SQL, focus on what can be done with an RDBMS with ACID properties and properly constructed data relations, constraints, and triggers.

You can not only make queries/views/etc, but you can also write transactions that make the changes, (ie the INSERT/UPDATE/UPSERT/DELETE) that are applied atomically, relying on the SQL "engine" (ie what they're building) to make sure that the relationships (foreign keys, constraints etc) are maintained.

If it's done right, then you can have the equivalent of an ODBC/JDBC driver for it and then use the ORM/framework of your choice to work with the schema.

bob1029 · on Sept 16, 2021

This is really interesting to me. I absolutely love SQL and the power it has for modeling complexity.

I also don't think this whole thing has to be absolutely pure either... I would have no problems seeing user-defined functions that have side-effects in this kind of scope. You can certainly wrap a declarative + gateway pattern around domain registration or other global one-time deals, but perhaps exposing that side-effect to the SQL user would encourage more robust application development. Exception handling is something that is very hard to perfectly abstract away in a declarative sense.

theplague42 · on Sept 16, 2021

When will an ORM be available?

Still not sure whether this is serious or not, but it's not really infrastructure as SQL, it's infrastructure as database records which is stateful and defeats the point.

zomglings · on Sept 16, 2021

Is this sarcasm? I have a broken sarcasm detector.

exabrial · on Sept 16, 2021

If it doesn't support window functions or CTEs I'm out.

mr_toad · on Sept 17, 2021

If I did infrastructure in SQL I’d accidentally leave a column out of a join and end up creating millions of dollars worth of machines I didn’t want.

yawaramin · on Sept 19, 2021

Or, like some of today's tools it could ask you for confirmation of an execution plan before doing anything.

esjeon · on Sept 17, 2021

It's like regex. It doesn't solve a real problem, but only makes it easier to do so. Users are still responsible for every tits and bits.

Dizcorded · on Sept 16, 2021

This has absolutely no relevance to me but that SQL snippet just bugs the hell out of me. I can't fathom why you would do a subquery in the from section and then not utilize a join. You're just bringing 2 datasets and letting them sit side by side, why not just move the subquery to the select section? Execution plan should result the same so I guess this is just a preference thing?

theplague42 · on Sept 16, 2021

It's using an implied cross join so each image is deployed onto a t2.micro

Dizcorded · on Sept 16, 2021

Ahh, gotcha. I appreciate the response there as I wasn't aware of that notation and even then I can't think of any time I've used a cross join. Not sure which syntax I would use personally.

disgruntledphd2 · on Sept 16, 2021

> I've used a cross join

They're good for getting rates on small datasets. think (select grouper, count(1) from data) cross join select count(1) from data) I think I've mostly used them in interviews, tbh.

whoomp12342 · on Sept 16, 2021

isnt the big advantage to infrastructure as code the fact that you can version control it? and isnt it notoriously difficult to version control SQL? maybe I am missing something

yawaramin · on Sept 19, 2021

It's actually really easy to version control SQL migrations, they are just plain text. It's what application developers who use databases have been doing for a long, long time. Check out Ruby on Rails migrations.

rswail · on Sept 17, 2021

version controlling SQL queries is difficult, version controlling tables and rows of data and relations (FKs, constraints) is what RDBMSs do :)

catlifeonmars · on Sept 17, 2021

I feel like this has the most value for highly state full services. If your primitives are declarative, then this feels like a step backwards compared to CDK (I don’t have experience with terraform). As it turns out, composition and abstraction are really powerful tools for building complex infrastructures. For reading infrastructure state though, this sounds amazing.

kthejoker2 · on Sept 17, 2021

Just imagine the Bobby Tables of iasql ...

In all seriousness, one issue I see in the comments here is that we're still not truly in the mindset of "infrastructure as code" mode.

If the existing infrastructure is so fragile and precious that DROP DATABASE is a non-starter as opposed to Chaos Engineering 101, then SQL paradigms are not the problem.

econti · on Sept 16, 2021

Being able to check in a SQL file to our repo to manage all of our infra sounds like a dream. Added myself to the early access.

rajamaka · on Sept 17, 2021

Why would you prefer to do this with SQL rather than any of the existing frameworks that already exist?

kwertyoowiyop · on Sept 16, 2021

Why not just a text file? Is this the main reason?

> Unlike IaC, IaSQL makes the relations between pieces of your infrastructure first-class citizens, enforcing type safety on the data and changes to it.

It seems easier to solve that problem via a source-control hook to check a text file, than move over to SQL. But maybe this proposal gets you other things too?

yawaramin · on Sept 19, 2021

You get:

- CHECK constraints, e.g. you can make it impossible to misspell a VM type

- Type checks (depending on the database engine), e.g. you can't accidentally put a string where it wants an integer

- Foreign key checks so you can't delete some resource if others depend on it

- Atomic commits so the entire infra update happens in one shot or not at all.

LaserToy · on Sept 16, 2021

We discussed writing a plugin for Trino to support Kubernetes operations. Was more like a fun thought experiment. Looks like someone built it...

Thaxll · on Sept 16, 2021

Just why... why would anyone uses / learn SQL to manage infrastructure. Also fyi people managing infra are usually not the one doing SQL.

remus · on Sept 17, 2021

I feel like it's less about SQL and more about having an RDBMS holding the state of your infra. That then gets you all the nice relational + ACID stuff.

rmetzler · on Sept 16, 2021

I wonder how this works. Maybe it’s similar to osquery with virtual tables on top of sqlite? But then, how would you market this as SaaS?

Aeolun · on Sept 17, 2021

I uh, kind of like my IaC to tell me what is going to change before I accidentally run a ‘DELETE FROM ec2_instances’.

yawaramin · on Sept 19, 2021

And there's nothing preventing an IaSQL tool from doing that. Just depends on the implementation.

scubbo · on Sept 17, 2021

I am 99% certain this is satire, but on the off-chance that I'm wrong - you know that Terraform, CDK, etc. exist, right?

brodouevencode · on Sept 16, 2021

Is the signup form broken?

pombo · on Sept 16, 2021

No, just double checked. Why did you think so?

brodouevencode · on Sept 16, 2021

Hit the sign up button at the top, scrolls me to the bottom but there's nothing there. Using Brave on Mac, even tried with blockers disabled.

gigatexal · on Sept 16, 2021

yeah the button scrolls you to the bottom but there's no way to sign up.

pombo · on Sept 16, 2021

ah it was a bug in some screens. we think we just fixed it. where you guys on mobile, can you check again?

gigatexal · on Sept 17, 2021

Yup fixed. I just got on the wait list.

ironfootnz · on Sept 16, 2021

I'm not sure if this is useful at any capacity. Sorry to be hard on that.

topspin · on Sept 17, 2021

There is good and bad. You're giving up conventional text based version control (assuming the 'database' behind this is anything like a conventional RDBMS database.) You could pick up a lot of flexibility; multiple, high feature frontends using anything that can deal with a relational schema, for example. Automation could benefit from the (loosely) standards based API that is SQL.

I think it's worth exploring. I suspect that if it had been done this way on day one people would be suspect of any approach that looks like the hodgepodge of bespoke cloud APIs we wade through today.

I've been working with OCI lately. Oracle Cloud Infrastructure if that still needs to be called out. Ironic that the company that popularized RDBMS software didn't invent this. They've got a really nice, first class fully supported Terraform provider, but apparently it didn't occur to anyone at Oracle that they might use their raison d'être to solve the problem. They could have done both even; the Terraform provider could be a front end for SQL DQL/DML operations.

The iasql blog makes the rather good point that behind all the kooky Cloud APIs and their vast walls of documentation and third party abstractions everything you're doing is actually landing in a SQL database anyhow. What, exactly, is the value of all that stuff in the middle?

There isn't enough info on the iasql site to illuminate this, but I think one problem they'll have is with types. There are a lot of special types in the world of cloud provisioning and if they try to shoehorn all those types into varchar, number and uuid the result will not be pleasing. Obvious examples would be IP addresses or CIDR notation.

nathanwallace · on Sept 17, 2021

I mentioned Steampipe [1] above, it has an open source OCI plugin [1] to query resources (no support for write operations). It's a Postgres FDW under the hood, and uses subset of Postgres types (bool, bigint, double, text, inet, cidr, timestamptz and jsonb) along with their standard operators. This approach has proven very simple and effective for us.

1 - https://steampipe.io 2 - https://github.com/turbot/steampipe-plugin-oci

emersonrsantos · on Sept 16, 2021

DROP DATABASE ooops