The way this works with AWS is similar to you making a GCP project.
At the top level you have an organization account, which is where billing occurs.
From this org account you create accounts for the following (typically):
1. Security - AKA the account your USERS are in
2. Ops - The account your monitoring, etc are in
From here where a lot of people seem to deviate (I've been interviewing level 2-3 SREs for the last 3 weeks and have heard all about different AWS structures that I don't like) is how to break up your applications into their own accounts for a low blast radius.
What I DO, and is well known as being the best practice, is to create an AWS account for each environment of each application.
App1-sandbox
App1-staging
App1-production
Then your terraform is also structure by application/environment/service. Each environment and application has it's own state in s3 and dynamodb.
And so on.
Is this unwieldly? I have 40-50 AWS accounts and no it's not unwieldly at all IMO. Cross account IAM and trust relationships are set up very early on and they don't need to be modified much if any at all until you create another AWS account. Creating a new AWS account is kind of annoying, though. I need to automate that process better.
FWIW I loathe GCP IAM and miss AWS IAM, CloudFormation, and not having to talk to any one single person or piece of software about "please enable this foundational API in your Project"
A rare opinion but one I share wholeheartedly. I started my career at Google Cloud but spent the rest of it working with AWS. AWS always feels like an uphill struggle, lots of micro management and resources that need to be duct-taped together. I'm lucky to have recently landed a Google Cloud gig and my God, things are so much easier and smoother now. It just seems better designed and integrated to me, albeit much fewer services to choose from if you don't buy into their ecosystem.
I’m quite into learning a lot of cloud native security stuff and I have to say my first impression was that it seemed so much harder to think about creating a secure environment using AWS IAM. I couldn’t tell if it was just a case of familiarity or not.
I'm sure it's because of it's age and them kind of creating their version of IAM from scratch (someone correct me if they copied this structure from elsewhere) but you have to do a lot of goofy obtuse work with IAM automation. There are times I have to go into the console/cli and grab some sort of specific UID for an object instead of using its name, things like that that just make it annoying. Sometimes you can't use an account name and have to use the org ID... I could go on. You just kind of deal with it.
I haven't worked on GCP since maybe 2016-17 so I'm not sure how it's going over there anymore.
It really does sound like an entirely different level of complexity.
GCP native API is basically the same thing as knative in most ways. Just a bunch of various services and resources that you all call and authenticate and even often provision the same way.
As an example of that since we are talking about infrastructure management I would say at its “smoothest” level of integration there is a service you can use (or host it yourself on Kubernetes if that’s your thing for some reason) where like any other Kubernetes resource I would just “declare” what I wanted.
So now I’m not messing around with complicated Terraform logic at all (Google got really good with automation, I don’t think there is anything close to an equivalent for this is there?). I just declare say a BigQuery resource or a Project (AWS Account equivalent) resource and the service will do all the hard work of making sure that’s the state my account is in at any given point.
I can also stick policy controls around it like I would with K8s so only certain people can create certain resources under certain conditions.
It’s really easy to just stick that into a git repo and still do all of the IAC stuff mentioned in this article but it’s also easy to do the cross environment stuff and manage the roll out between each of them.
Overall, it’s very predictable, the IAM is really intuitive but also incredibly granular so it’s very easy to model things on top of and to feel fairly confident that I’m not accidentally doing something stupid so I really like it from that point of view.
My number one bit of advice for GCP is see how easily you can architect your way into using Cloud Run as much as possible unless you have some really wild use case. You can get to a really sophisticated set up with only a tiny team. Followed by read Google’s API guidelines (aip.dev) to understand how to build things in a way where you’re going to continuing having a good time.
How do you deploy your Apps? We exclusively use EKS and having one account per env and app seems like quite an overhead when I think about managing / updating EKS clusters for each one. It also comes with an overhead of base applications that need to run in each cluster by default (like cert-manager, externaldns etc).
Right now we’re using one account per env but also see downsides and thought of going the next step to do one account per env and tribe/team.
Each app/env has a pipeline that will trigger a tf apply in its directory w/ its assumed AWS role and deploy an env after someone gives it a manual approval after looking at the terraform apply/plan output. So it will start at /terrafrom/app1/staging then once healthchecks succeed another manual approval job for /terraform/app1/production will wait to be approved to depoy.
For our EKS apps we do helm rollouts, but most of our services are on ECS so it's mostly just updating a task definition and forcing a redeployment of containers.
Each EKS cluster is set up exactly the same aside from the usual things like vpc and ips and things of that nature that switch between them. They all get a set of "base" apps like log chutes and cert manager and all that as soon as they're deployed.
Our app environments don't communicate with one another at all. The only relationship between them is our IAM accounts in our security account can assume access into them as admin/etc.
It's not for the faint of heart though. You need to allocate subnets to individual applications (with relevant capacity planning concerns) plus support is sometimes spotty (e.g. EKS doesn't support it last time I checked). Not worth doing until you have several teams trying to use the same VPC and stepping on each others' toes.
You can, but using Terraform to provision resources inside those accounts entails pulling generated/defined credentials from the org-level TF state and feeding that into the provider config for each app-env-level TF state. Vanilla Terraform doesn't support that very well (or at all, last I checked), but either some CI/CD pipeline creativity or Terragrunt (or both!) can work around it.
Can providers use the output from terraform_remote_state to set e.g. credentials? Last I checked, datasources get sourced during terraform plan/apply whereas provider configs need to be known as early as terraform init.
At the top level you have an organization account, which is where billing occurs.
From this org account you create accounts for the following (typically):
1. Security - AKA the account your USERS are in 2. Ops - The account your monitoring, etc are in
From here where a lot of people seem to deviate (I've been interviewing level 2-3 SREs for the last 3 weeks and have heard all about different AWS structures that I don't like) is how to break up your applications into their own accounts for a low blast radius.
What I DO, and is well known as being the best practice, is to create an AWS account for each environment of each application.
App1-sandbox App1-staging App1-production
Then your terraform is also structure by application/environment/service. Each environment and application has it's own state in s3 and dynamodb.
And so on.
Is this unwieldly? I have 40-50 AWS accounts and no it's not unwieldly at all IMO. Cross account IAM and trust relationships are set up very early on and they don't need to be modified much if any at all until you create another AWS account. Creating a new AWS account is kind of annoying, though. I need to automate that process better.
https://aws.amazon.com/organizations/getting-started/best-pr...