The AWS Controllers for Kubernetes

solatic · on Aug 20, 2020

Currently no support for provisioning IAM permissions. ACK will be happy to construct an S3 bucket for you, that is then inaccessible, unless you use dangerous IAM wildcard permissions.

Team is concerned about the security ramifications of setting up IAM permissions from ACK: https://github.com/aws/aws-controllers-k8s/issues/22#issueco...

Look, it'll be great when it matures... but this is very much in the developer preview stage. Caveat emptor.

Legogris · on Aug 20, 2020

Am I the only one who's pessimistic about this? One of the big upsides of Kubernetes is having portable workloads and provisioning cloud-provider specific resources (whose lifecycles very likely outlive clusters!) in Kubernetes just seems wrong to me. Kubernetes is great for managing, orchestrating and directing traffic for containerized workloads but it really shouldn't be The One Tool For Everything.

Coupling everything together like this just seems to make things less manageable.

IMO infrastructure including managed services are better provisioned through tools like Terraform and Pulumi.

solatic · on Aug 20, 2020

The issue (or benefit, depending on your perspective) with Terraform is that it's a one-shot CLI binary. If you're not running the binary, then it's not doing anything. If you want a long-running daemon that responds to non-human-initiated events, then Terraform isn't a good tool.

Any time you try to declaratively define state, if you don't have a daemon enforcing the declarative state, then you will suffer from drift. One approach is the one Terraform has - assume that drift is possible, so ask the user to run a separate plan stage, and manually reconcile drift if needed. Another approach is the controller approach, where the controller tries to do the right thing and errors/alerts if it doesn't know what to do.

redwood · on Aug 20, 2020

This is why Hashicorp needs to accelerate their cloud offering.

Frankly I get the sense they got a little bit too addicted to central ops driven on-prem style deals for Vault but in the public cloud they need to be front and center with SaaS which is a long road. They have a rudimentary Terraform SaaS I believe but none for Vault as far as I'm aware. I see a lot of folks going straight to cloud provider services because of this.

You sum it up well... In these times you don't want to run a daemon

t3rabytes · on Aug 20, 2020

They used to have a managed Vault offering! But then it disappeared one day never to return.

013a · on Aug 20, 2020

At least with many Kubernetes controllers that have been built in the past (I haven't tried ACK), they don't respond to or attempt to "live-correct" changes in configuration which are made outside of Kubernetes (e.g. via the AWS UI). They act on events that are fired in the cluster when a resource they monitor changes in the cluster.

So, again I haven't tried ACK, but they don't help to correct the drift that you outline as a problem. They will take action to correct the drift if the internal resource changes, but how often do you update the desired configuration of an S3 bucket? For me, almost never.

In essence, the controller acts less like a live daemon always keeping things in sync, and more like Terraform in a CI pipeline. And, given you've probably got all your Kubernetes YAMLs inside a git repo anyway, all you've accomplished by deploying one of these things is trading a predictable, easy to debug step in a CI pipeline for a live running service inside your cluster that is in perpetual beta, could stop running, have bugs, could get evicted, uses cluster resources, doesn't talk to Github automatically, etc.

In fact, you'll often times even get drift upon internal resource updates, in situations where the controller is too spooked to make a change that could be destructive. We've seen this with the ALB Ingress Controller, where it never deletes ALBs, even if the entire underlying service, ingress, etc are deleted.

(Edit): To be clear: I think the direction of "specify everything in kubeyaml" could end up being a win for the infra world. If we could throw out a terraform+kubeyaml system for just "everything in kubeyaml", that feels like a simplification. But, I'm not convinced that the best way to get that kubeyaml into AWS is via live-running code inside clusters, especially since the complexity of AWS means its literally never going to work right (they can't even get CloudFormation to work right, and that's a managed service). A live-running controller is necessary for some things, like ALB Ingress, due to how quickly those changes need to be made (updating the ALB with new IP/ports of containers). But, for other things like S3/SQS/etc, I'm less sure.

kragniz · on Aug 20, 2020

I haven't checked the ACK code to see what they do, but this is what the resync period is for in controllers.

Here's a comment with some discussion: https://github.com/kubernetes/kubernetes/pull/75423/files

DenseComet · on Aug 20, 2020

I'm not 100% sure about the specifics for ACK, but I've been using Weave Flux to manage manifests in my cluster. It resyncs every 5 minutes or so and will correct any drift on the manifests that it can see in the git repo. Rerunning terraform on a set schedule can achieve the same thing with pieces of infrastructure that falls under terraform.

The issue with both of these is that if you create any new resources/manifests outside of git, they'll be invisible to the tools. If ACK can solve this problem, I can see it being quite useful, but if not, the problems that 013a mentioned apply.

multani · on Aug 20, 2020

I think this was the promise of Docker's InfraKit [1] (now DeployKit), to have your infrastructure constantly monitored and adjusted depending on the configuration you pushed into it.

Sadly it didn't go anywhere and it's now archived.

[1] : https://github.com/docker/deploykit

Legogris · on Aug 20, 2020

This issue is to a large extent addressed by Terraform Cloud.

solatic · on Aug 20, 2020

Terraform Cloud runs the executable for you on the basis of VCS triggers e.g. open pull requests, merging to master.

Terraform Cloud does not know to spontaneously run a plan and apply when a bucket described by your infrastructure code is surreptitiously deleted by somebody in a team on the other side of the ocean.

nhoughto · on Aug 20, 2020

Yeah always been my thinking with aws or similar operators.

Lifecycle of resources has to be stacked like a pyramid.. I don’t really want the definition of a bucket or a database(!) to live as a crd in a cluster. I’m much more likely to destructively update a cluster than a bucket. Coupling then doesn’t seem to make sense to me.

Some upsertable stateless resources would be nice tho, like having a crd to create an iam role for a deployment would be neat, so I can deploy a workload and define a role and policy attachments using the same mechanism, works where the crd defines the entire scope of the resources, not just create a bucket that can be filled with stuff.

Begs the question of the threat modeling tho, the aws controller has god privileges in the account? If I get to use my iam crd to provision iam resources for me (super helpful) then aws controller needs high IAM privs. So now kube cluster compromise could be escalated to aws account compromise, not really least privilege. Convenient though...

But then also iam isn’t supported so ️

vergessenmir · on Aug 20, 2020

You raise a good point on the chicken and egg scenario of having the K8S cluster created in AWS creating other AWS resources. Rebuilding everything from scratch will take a lot longer since you can't build concurrently.

The advantages of pulumi and terraform over this also come from HCL and the Pulumi DSL _and_ their respective state files.

This AWS controller only has the state through the API objects. Managing that is a pain because the organisation of those objects is left as an exercise for the user or other tools. Pulumi and Terraform constrain that to well thought-out DSLs.

If there was a sane unifying DSL for managing kubernetes objects it would certainly simplify things. It would not, though, solve the speed of rebuilding the infrastructure.

theptip · on Aug 20, 2020

Regarding lifecycle, most larger shops are going to have an infra team that owns “bedrock infrastructure” terraform config for things like VPCs, IAM, maybe project structure. Then each team will have their own terraform scripts for k8s cluster, DBaaS, queues, etc - the application infrastructure.

This tool is great for the latter case; give me a k8s cluster and I can stand up all of my services in a one-shot command, with a single tool chain.

TBD whether you’ll see infra teams moving their terraform scripts into an infra k8s cluster. I’m inclined to believe it’s less of a good fit there, as your bedrock should not change often, so the marginal value over terraform is less.

streetcat1 · on Aug 20, 2020

So the vision behind operators is not what is managed but who is managing. The idea is to move operations and provisioning from humans to machines.

What should exist, is a set of standards that defines generic cloud resources, and than each cloud provider would implement a translation operator from the standard to its own CRD. Of course, the leading cloud provider has no incentive to do that since this would reduce lock in.

For example, there should be a VirtualBucket CRD which would map to AWS Bucket, Azure Container etc.

Terraform was first into this area, but terraform is still manual in nature.

I think that Pulumi uses that same code generation as Arc.

thinkersilver · on Aug 20, 2020

Kubernetes is becoming the lingua-franca of building infrastructure. Through CRDs and the kube api spec I can

- start an single application

- deploy a set of interconnected apps

- define network topologies

- define traffic flows

- define vertical and horizontal scaling of resources

And now I can define AWS resources.

This creates an interesting scenario where infrastructure can be defined by the k8s API resources and not necessarily have k8s build it. For example podman starting containers off a K8S deployment spec. It's an API first approach and its great for interoperability. The only downside is managing the yaml and keeping it consistent across the interdependencies.

brown9-2 · on Aug 20, 2020

The Kubernetes resource model is the real lingua franca

soulnothing · on Aug 20, 2020

I really wish fabric8, and more specifically kotlin k8 dsl[2] was getting more traction.

It removes the down side of yaml all over the place. It's missing the package management features of helm. But I have several jars acting as baseline deployments, and provisioning. It works really well, and I have an entire language. So I can map over a list of services, instead of do templating. The other big down side is a java run takes a minute or two to kick off

I was resilient to k8 for a long time. Complexity was secondary to cost, but Digital ocean has a relatively cheap implementation now. This commonality and perseverance of tooling is great.

I want metrics, a simple annotation. I want a secret injected from vault, just add an annotation. It's also cloud agnostic, so this logic can be deployed any where some one provides a k8 offering.

EKS was very powerful. As running service accounts via non managed clusters. Removed the need to pass an access key pair to the application. That service account just ran with a corresponding iam role.

[1] https://github.com/fabric8io/kubernetes-client [2] https://github.com/fkorotkov/k8s-kotlin-dsl

thinkersilver · on Aug 20, 2020

It's been a while since I've looked at Fabric8 but it had good java -> k8s integration and was great for writing k8s tools.

It appears though that Fabric8 is useful for solo java projects without complex dependencies on non-java projects or small java shop. It overlaps with where jenkins-x is going, which has made major strides in the last 24 months.

The original team that worked on Fabric8 lead by James Strachan all moved on from Redhat and many of them are working on Jenkins-x.

vtkacenko · on Aug 20, 2020

I’d love an option for the yaml to be changed to something akin to gRPC .proto configs.

harpratap · on Aug 20, 2020

Glad to see AWS finally embracing Kubernetes too. Google did a similar thing a while back - https://cloud.google.com/config-connector/ So I guess this solidifies Kubernetes as the defacto standard of Cloud Platforms.

nuker · on Aug 20, 2020

> Glad to see AWS finally embracing Kubernetes too. ... So I guess this solidifies Kubernetes as the defacto standard of Cloud Platforms.

I don't think so. We switched most our workloads from k8s to ECS (mix of EC2/Fargate launch types) because clusters were hard to maintain. Now everything is visible in Console and we now have tech support down to the ECS Task level. Before doing that, management pestered AWS about future of ECS. AWS insisted that it is actively developed, and lots of their internal stuff runs on ECS, not k8s.

You need to understand why AWS churns out silly services every month. Tomorrow there will be new HypedTech, and days later new AWS ElasticHypedTech service. It's just good business.

zxienin · on Aug 20, 2020

Azure as well https://github.com/Azure/azure-service-operator

FridgeSeal · on Aug 20, 2020

Bold of you to assume that:

* the magic permissions ghost that runs in Azure whose job it is to inexplicably deny you resources to things won’t interfere

* Said Azure service will stay up long enough to be useful

* you finish writing the insane amount of config Azure services seemingly require before the heat death of the universe.

* Azure decides that it likes you and won’t arbitrarily block you from attaching things to your cluster/nodes because it’s the wrong moon cycle/day of the week/device type/etc

* you can somehow navigate Azures kafka-esque documentation got figure out which services you’re actually allowed to do this with.

It is only a slight exaggeration to say that Azure is the most painful and frustrating software/cloud product I’ve used in a long time, probably ever, and I earnestly look forward to having literally any excuse to migrate off it.

xiwenc · on Aug 20, 2020

I’m feel your pain also my friend. Azure quality is terrible compared to competitors:

* no good working examples in docs

* docs hard to read

* docs are not consistent with reality

* the web portal UX is inconsistent and outright weird (when you navigate through resource group, you can scroll back horizontally to previous context/screen; what a joke)

* there are a gazillion preview api versions that never gets released officially.

* and if you’re lucky to work with azure devops, it’s like building a house of cards with different card types and sizes

I’ve worked with AWS and GCP in the past. Indeed Azure is often chosen by CIO’s rather than people that have to work with the service every day.

FridgeSeal · on Aug 20, 2020

Oh my god the web UX, how did I forget about that: for the life of me I cannot figure out why they make all the interfaces scroll sideways. Why? Who does that?

Docs being hard to read and inconsistent with reality is a big point. My favourite mismatch is the storage classes one: it turns out there's actually 2 different grades of SSD available, but their examples and docs only mention premium SSD's. I only discovered "normal" SSD because they happen to auto-create a storage class with them in your Kubernetes cluster. The adventure to figure out whether you can attach a premium SSD to an instance is a whole new ball game - trying to find which instances _actually_ allow you to attach them is like looking for a needle in a haystack. Why are they so difficult about it? AWS is like "you want an EBS volume of type io1? There you go, done". Azure: "oh no, you _can't_ have premium ssd. Because reasons".

ahoka · on Aug 20, 2020

Actually there are three kind of SSD storage types in Azure: Standard, Premium and Ultra. I’m assuming that you need to provision an ‘s’ VM because the regular instances lack the connectivity for the faster storage, but that’s just guessing.

FridgeSeal · on Aug 20, 2020

Oh I forget about the ultra ones.

I found a few instance types when I went looking, but their interface does not make it easy to figure out which ones are premium eligible, but I do remember the price going up not-insignificantly for a premium capable machine, which feels about like double-dipping if you’re also paying extra for the SSD.

arethuza · on Aug 20, 2020

I thought the sideways scrolling (which was pretty weird) was only on the "old" Azure Portal - I'm not aware of it bothering me on the current Portal.

GordonS · on Aug 20, 2020

What's your beef with Azure DevOps? Personally, I love it, and find it far superior to anything else I've tried.

FridgeSeal · on Aug 20, 2020

Those configs feel like they're longer than a Russian literature novel.

Setting them up is a pain: why do I need a different set of permissions to set up a service connection for every single pipeline?

Why does my manager who has what we can only assume is a _marginally different_ set of permissions just see an approve button for the service connection but I get taken to page asking me to setup a service connection? (Which ultimately fails anyway)

Why does it want to create a Kubernetes secret on every deploy?

It does funny things with the output from Kustomise, and we discovered we had to add parameters in to make it stop adding characters to the end of the generated configs.

Did you once, look at a pipeline? Be prepared to get more emails from devops than an overly-eager marketing department.

If you have a Terraform in your Pipeline, Azure will lie to you about the status of you Apply's: it'll tell the pipeline that the resource was created, when it fact it was not successfully created, this will mess up your TF state.

If you deploy to a K8s cluster, and your application crashes during startup for some reason, the pipeline treats this as a failure, IMO that's beyond it's set of responsibilities: it should take care of making sure the deployment configs get to Kubernetes, application failing to start is the concern and responsibility of the developer, not the CI pipeline.

GordonS · on Aug 20, 2020

I can't comment on the k8s stuff specifically, but that does seem to be rather notorious for dubious documentation and features.

> Those configs feel like they're longer than a Russian literature novel

Assuming you mean pipelines YAML files, I can't agree - my pipelines are always quite terse, and you can also split them into multiple files if wanted. IME they are no longer than pipeline configs from AppVeyor, Travis etc.

> Setting them up is a pain: why do I need a different set of permissions to set up a service connection for every single pipeline?

IIRC, you don't - I'm sure there is an option when creating the connection to grant permission to all pipelines.

> Why does my manager who has what we can only assume is a _marginally different_ set of permissions just see an approve button for the service connection but I get taken to page asking me to setup a service connection? (Which ultimately fails anyway)

Working with service connections is definitely a PITA, but since you need to have the requisite permissions in Azure, I'm not sure there is a good way of handling things.

> Did you once, look at a pipeline? Be prepared to get more emails from devops than an overly-eager marketing department.

I work with pipelines all the time, and don't recall getting a single email from Azure DevOps marketing people.

FridgeSeal · on Aug 20, 2020

> I work with pipelines all the time, and don't recall getting a single email from Azure DevOps marketing people.

Oh sorry I worded that poorly - I mean my inbox is absolutely crowded with emails from devops itself. I'm sure I can turn it off, it's just absurdly chatty out of the box.

> Assuming you mean pipelines YAML files, I can't agree - my pipelines are always quite terse, and you can also split them into multiple files if wanted. IME they are no longer than pipeline configs from AppVeyor, Travis etc.

Yeah I was, how long are yours? I've got a couple here that are bordering on 300 lines which is pretty wild considering I only need to build a container and push it to K8s... Best CI/CD I've had was with Drone: equivalent build YAMLS were probably 20-30 lines, tops

> Working with service connections is definitely a PITA, but since you need to have the requisite permissions in Azure, I'm not sure there is a good way of handling things.

Compared to AWS' permission model, my experience is that it's distributed amongst throughout a number of locations, and needlessly complicated to ensure that team members have the permissions they need.

GordonS · on Aug 20, 2020

> Yeah I was, how long are yours? I've got a couple here that are bordering on 300 lines which is pretty wild considering I only need to build a container and push it to K8s... Best CI/CD I've had was with Drone: equivalent build YAMLS were probably 20-30 lines, tops

As a random example, I have a pipeline that builds a multi-arch container image for both x64 and ARM and pushes it to ACR, and also builds, tests and deploys a Function App, and publishes artifacts. The main pipeline file is about 75 lines, and it makes use of 3 other files that setup docker buildx, login to ACR, set the build version, that kind of thing. In practice, I find these kind of "setup" tasks often use as many lines as the main file, but it's often reusable stuff, and splitting them out into templates is really helpful, I think.

In total, it's around 120 lines for all the pipeline files. I am keen on makefiles though, so a lot of the actual build script is in them, ensuring we can also build locally too.

Another example is a web app, where it builds, runs unit and integration tests in a container, and publishes build artifacts for a release pipeline to take over deployment. It also has a 2nd job where it builds some Windows-specific stuff in a Windows agent. Whole thing is less than 100 lines.

The simplest pipelines will be less than 30 lines - for example, the web app I mentioned could easily be done in 30 lines if tests didn't run in a container.

I don't mind taking a look at your YAML file if you think another pair of eyes would help (my email is in my profile, and you can anonymise bits of your file if you want).

tracker1 · on Aug 20, 2020

On the email chatter, have to agree... we have the ones we mostly care about in a Teams channel (PR request). I need to get another filter for "mentions you" so that I can send the rest to the trash.

As to the pipeline length, most of my interactions are npm/node scripts via package.json within various projects . Often even other language projects I'll use node/npm as a task runner which makes for decent glue code. That tends to keep my pipeline files much smaller as it's usually...

    1: add tools
    2: configure environment injections
    3. run npm script(s)

At least then I have a little more control. Same for github, gitlab and others.

It seems Azure and AWS permissions seem to be at opposite ends of the spectrum. I will say I find dealing with Azure's Storage services (Blob, Table, Queue) to be really nice for a lot of common use cases. At least the tables and queues.

ithkuil · on Aug 20, 2020

Give it time, it will get better. When I was forced to use it in 2010 it was horrible; now ten years later is definitely better. In 2030 it will rock!

FridgeSeal · on Aug 20, 2020

That's like, a full 10 years I could spend not putting up with it though.

> In 2030 it will rock!

Apart from _maybe_ VSCode, what has MS made that makes you go "this rocks"?

GordonS · on Aug 20, 2020

Azure DevOps, dotnet core, WSL/WSL2, SQL Server on Linux, Typescript, Powershell Core (OK, so I honestly don't like Powershell, but many do), Visual Studio (I prefer Rider, but VS is great great), Windows 10 (yes, I love it), Azure DevOps (yes, I love it, unless I have to work with one of the poorly documented services), Xbox One...

FridgeSeal · on Aug 20, 2020

Fascinating.

Fair play, I'm either ambivalent or not a fan of most of those - except maybe the Xbox, I have a PS4, but the X1 is pretty nice.

> Azure DevOps...one of the poorly documented services

So like, all of them? Hahahaha

> SQL Server on Linux

Next to Azure, this is the next biggest cause of issues at work for me, and I haven't seen any wild advantage of this over Postgres, with the disadvantage that pretty much every linux/macOS and open source tool I use is much happier to plug in to Postgres instead.

tracker1 · on Aug 20, 2020

>> SQL Server on Linux

> Next to Azure, this is the next biggest cause of issues at work for me, and I haven't seen any wild advantage of this over Postgres, with the disadvantage that pretty much every linux/macOS and open source tool I use is much happier to plug in to Postgres instead.

It's been a pretty great thing for me, if only to be able to get a better local experience with Docker. A lot of our applications are centered around MSSQL Server (for better or worse), I'll say that despite the costs, replication in MS SQL is night and day better than Postgres.

It really depends on how you use SQL Server because it doesn't support all the features of the Windows version and if software uses those features, it gets pretty ugly. That said, if you develop against the Linux version, you're unlikely to have issues deploying to the full version, or Azure SQL.

Of course on other projects we're using PostgreSQL and I tend to prefer it as binary json support is huge imo. Getting other devs to understand that you don't need to break off child records into separate tables that require series of joins, that's another story.

GordonS · on Aug 20, 2020

> So like, all of them? Hahahaha

I really think that's unfair, untrue even. I work with a lot of services on a regular basis, and while it very frustrating to find something with missing documentation, it seems to be confined to certain services, and mostly then for new'ish features. In those cases, they are quite responsive on GitHub.

> Next to Azure, this is the next biggest cause of issues at work for me, and I haven't seen any wild advantage of this over Postgres, with the disadvantage that pretty much every linux/macOS and open source tool I use is much happier to plug in to Postgres instead.

I also personally prefer Postgres, and by a margin. But I have also worked with SQL Server on Linux, and haven't found any differences over SQL Server on Windows - it just works.

FridgeSeal · on Aug 20, 2020

> I really think that's unfair, untrue even. I work with a lot of services on a regular basis, and while it very frustrating to find something with missing documentation, it seems to be confined to certain services, and mostly then for new'ish features. In those cases, they are quite responsive on GitHub.

I was being a bit tongue-in-cheek here, but that is good to know they're responsive on GitHub.

> But I have also worked with SQL Server on Linux, and haven't found any differences over SQL Server on Windows - it just works.

Sorry I meant SQL Server in general, not specifically the linux one. You're pretty much relegated to using ODBC and messing around with all the requisite configs there, which is fine once it's working, but it's never as nice and straightforward as other db's ime.

tracker1 · on Aug 20, 2020

> Sorry I meant SQL Server in general, not specifically the linux one. You're pretty much relegated to using ODBC and messing around with all the requisite configs there, which is fine once it's working, but it's never as nice and straightforward as other db's ime.

So never dealth with Oracle outside Java?

I can't speak too broadly but mssql/tedious modules for node don't use the ODBC interface and work pretty well regardless of platform with minimal issues.

My biggest issues with MS SQL Server for Linux vs Windows comes down to some of the features that most dbms don't necessarily support in the first place.

throwaway13337 · on Aug 20, 2020

I'm in the process of switching a backend from django to dotnet core. As much as I love django's ORM, efcore is certainly feeling competitive.

A big reason is that C# has always 'rocked' and now they've got a platform that can be deployed on linux that feels complete. That's super awesome of them. And their documentation for dotnet core is top-notch.

That said, the last windows 10 auto-update managed to cripple the UI and (for real) network speed of the computer it ran on. That definitely didn't make me feel great about MS.

It's a mixed bag. It's almost as if the company is huge and we cannot say much about it as a whole.

brutal_boi · on Aug 20, 2020

Wind- gets shot by distant sniper

hestefisk · on Aug 20, 2020

WSL, .Net Core?

FridgeSeal · on Aug 20, 2020

I'm ambivalent about .net as a whole, but I'll pay that one.

zxienin · on Aug 20, 2020

> and I earnestly look forward to having literally any excuse to migrate off it

Azure’s enterprise sales machine will make sure your wait is long. They sell bulk block deals to CIO, not to devs directly.

lifty · on Aug 20, 2020

I have to agree with you, I find it painful to work with Azure.

brown9-2 · on Aug 20, 2020

It’s a shame Google didn’t open source their connector, feels like they missed out on a chance to make a community around this stuff.

StreamBright · on Aug 20, 2020

> Glad to see AWS finally embracing Kubernetes too

Yeah, they are embracing more income. Not really that surprising.

sytringy05 · on Aug 20, 2020

I can't decide if I think this is a good idea or not. Conceptually I like that I can get a s3 bucket/rds db/sqs queue by using kubectl but I'm not sure if that's the best way to manage the lifecycle, especially on something like a container registry, that likely outlives any given k8s cluster.

dmlittle · on Aug 20, 2020

S3 buckets and RDS instances probably also outlive your cluster if you're treating them as cattle. I want to make sure I don't delete my RDS instance while I'm doing a blue/green cluster upgrade. I can see defining SQS next to your application being handy (they're somewhat stateful while there are messages in the queue but I'm sure you can do something to ensure they're drained before destroying them).

throwayws · on Aug 20, 2020

Maybe it makes sense for development, short-lived resources or cattle. Don't start about cross-cluster use though, having to think about where a resource is defined/used is a mess.

The same issue is with Service Catalog and the open service broker model. AWS ACK is way better in its expressiveness, in comparison. (one crd per resource) It is more difficult to generate them though, compared to simply adding existing brokers.

monus · on Aug 20, 2020

I agree. There are also cases where many Kubernetes clusters in different regions consume the same cloud service like RDS Cluster. So, tight coupling of cloud service (assumed to be stateful in most cases) and workloads doesn't work in some scenarios.

In Crossplane, our primary scenario is to have a dedicated small cluster for all the orchestration of infrastructure and many app clusters that will consume resources form that central cluster. So, clusters with workloads come and go but the one with your infra stays there and serves as control plane. See the latest design about this workflow: https://github.com/crossplane/crossplane/blob/master/design/... I'd be happy to hear your feedback on the design.

Disclaimer: I'm a maintainer at Crossplane.

closeparen · on Aug 20, 2020

Why are these clusters going away?

sytringy05 · on Aug 20, 2020

We rebuild ours all the time. New config, k8s version upgrade, node OS patching.

closeparen · on Aug 20, 2020

Interesting. I'm only familiar with Mesos/Aurora, which is often considered outdated next to Kubernetes, but it can do all those things in place.

Do you end up with a "meta-kubernetes" to deploy kubernetes clusters and migrate services between them?

harpratap · on Aug 20, 2020

You definitely can do the same with Kubernetes too, just that the scope is too large and it doesn't have a good reputation with rolling updates of controlplane.

> Do you end up with a "meta-kubernetes" to deploy kubernetes clusters and migrate services between them?

Congratulations, you just discovered the ClusterAPI

algorithmmonkey · on Aug 20, 2020

The management cluster is definitely a pattern. Cluster API is designed to have a management cluster to build workload clusters across region and provider.

https://cluster-api.sigs.k8s.io/

joshuak · on Aug 20, 2020

People work in different ways. We do in place upgrades across all components. In fact that's sort of why we use k8s in the first place.

Also, I have an allergy to complexity, so I need fewer moving parts for health reasons.

harpratap · on Aug 20, 2020

Most of the times it is because of fear of breaking something during a rolling update. So people usually create a new cluster from scratch and slowly migrate one by one.

ransom1538 · on Aug 20, 2020

Here is my container: run it. Where is my url? The end.

No, I don't want Terraforms, puppets, yaml files, load balancers, nodes, pods, k8s, chaos monkeys, Pulumies, pumas, unicorns, trees, portobilities, or shards.

I love cloudrun and fargate. Cloudrun has like 5 settings, I wish it had like 2.

throwaway894345 · on Aug 20, 2020

I too want simplicity, but Fargate still requires a load balancer in most cases. Further, you’ll probably need a database (we’ll assume something like Aurora so you needn’t think about sharding or scale so much) and S3 buckets at some point, and security obligates you to create good IAM roles and policies. You’ll need secret storage and probably third-party services to configure. Things are starting to get complex and you’re going to want to be able to know that you can recreate all of this stuff if your app goes down or if you simply want to stand up other environments and keep them in sync with prod as your infra changes, so you’re going to want some infra-as-code solution (Terraform or CloudFormation or Pulumi etc). Further, you’ll probably want to do some async work at some point, and you can’t just fork an async task from your Fargate container (because the load balancer isn’t aware of these async tasks and will happily kill the container in which the async task is running because the load balancer only cares that the connections have drained) so now you need something like async containers, lambdas, AWS stepfunctions, AWS Batch, etc.

While serverless can address a lot of this stuff (the load balancer, DNS, cert management, etc configuration could be much easier or builtin to Fargate services), some of it you can’t just wave away (IAM policies, third party servic configuration, database configuration and management, etc). You need some of this complexity and you need something to help you manage it, namely infra-as-code.

nojvek · on Aug 20, 2020

Cloud run is one of my favorite cloud services. It’s so easy to use and cheap for low traffic things. I set one up last year. GCP bills me 5 cents a month (they have no shame billing in cents)

https://issoseva.org hasn’t ever gone down.

bdcravens · on Aug 20, 2020

So basically, Heroku :-)

ashconnor · on Aug 20, 2020

Everyone wants Heroku but at AWS prices.

jakemalachowski · on Aug 20, 2020

That's a problem we're trying to solve at Render. (https://render.com/) It provides much of the flexibility you get from AWS with the ease of use of Heroku, at a significantly lower cost.

Disclaimer: I work for Render

bdcravens · on Aug 20, 2020

In the past I've used Cloud66, which was similar to what you described, with a Heroku-like API where you own the underlying infrastructure.

throwayws · on Aug 20, 2020

What do you use to provision managed services like rds? Or are you describing your dream UI :)

ransom1538 · on Aug 20, 2020

Literally that. The actual RDS ui (aws or gcp). I can setup a new db instance in 3 clicks. Slap in the dbs ip into the container.

marcinzm · on Aug 20, 2020

That works when you have one container. When you have 200 and fifty databases and four identical environments (prod, qa, staging, dev) and so on it becomes a lot more error prone. These systems exist for those use cases.

Granted even with one container and a single DB I found terraform useful as I don't have to fiddle with whatever DB, container, etc. settings there are every time. I also prefer to not have to memorize or figure out a UI I use once every other month. Too many UIs across too many things.

throwayws · on Aug 20, 2020

Your point makes totally sense. Though I also like to be able to bring up (and destroy) my servers, with a single command, when needed. Going through the same ui every time is a little annoying.

i_love_limes · on Aug 20, 2020

Eh, they have elastic beanstalk which serves that purpose. This is for Kubernetes, which almost explicitly states is for more fine tuned control of a system that needs scalability.

I think it's both unfair and ill advised to want that type of simplicity with Kubernetes.

That being said, I don't really understand why this is necessary when there are already a lot of tools out there to manage clusters (other than AWS wanting everything to live in their console)

lazyant · on Aug 20, 2020

Yes, life is way easier if you don't have any state to worry about and you only process recoverable jobs on cattle containers and it's a team of one person taking care of it, then by all means use simple solutions.

ransom1538 · on Aug 20, 2020

"if you don't have any state to worry about"

You just add a db ip address to your app config? like any other solution. States work great.

zeroxfe · on Aug 20, 2020

This is exactly why I love Cloud Run! Combined with Firestore, it's all I need for most things I build.

hardwaresofton · on Aug 20, 2020

At the risk of being early, RIP CloudFormation.

I posited that this was the benefit in knowing Kubernetes all along, and possibly the ace up GCP's sleeve -- soon no cloud provider will have to offer their own interface, they'll all just offer the one invented by Kubernetes.

delhanty · on Aug 20, 2020

GCP successfully executing a strategy described by Joel Spolsky in 2002 [0]:

>Smart companies try to commoditize their products’ complements.

[0] Strategy Letter V https://www.joelonsoftware.com/2002/06/12/strategy-letter-v/ (edited)

EwanToo · on Aug 20, 2020

You're very early (though maybe not wrong eventually), quoting from the article

"ACK is available as a developer preview, supporting the following AWS services:

Amazon API Gateway V2 Amazon DynamoDB Amazon ECR Amazon S3 Amazon SNS Amazon SQS"

And:

"Specifically, in the coming months we plan to focus on:

Amazon Relational Database Service (RDS), track via the RDS label. Amazon ElastiCache offers fully managed Redis and Memcached, track via the Elasticache label."

In 5 years maybe this will be comparable to Cloudformation, but AWS will likely launch services faster than this adds them.

weiming · on Aug 20, 2020

There is also the AWS CDK (https://aws.amazon.com/cdk/) which is essentially lets you use your favorite language like Typescript or Python to generate CloudFormation, with an experience similar to Terraform. We've been experimenting with instead of TF, hoping it's here to stay.

Normal_gaussian · on Aug 20, 2020

Take a look at pulumi; it provides a programmatic interface and related tooling on top of terraform.

hardwaresofton · on Aug 20, 2020

Took the comment right out of my keyboard (?) -- these days whenever I talk about devops with people, I bring up pulumi. HCL and and almost all config-languages-but-really-DSLs are a death sentence.

I am very unlikely to pick terraform for any personal projects ever again, imagine being able to literally drop down to AWS SDK or CDK in the middle of a script and then go back to Pulumi land? AFAIK this is basically not possible with terraform (and terraform charges for API access via terraform cloud? or it's like a premium feature?)

mcintyre1994 · on Aug 20, 2020

I'm really liking CDK so far - it feels like Typescript which I'm using it with fits is really well. Things like being able to pass variables around (like which branch to deploy!) is really nice - makes some things that would be really horrible in plain CloudFormation really easy. I do worry a bit about TypeScript (or really any 'complete' programming language) for defining infrastructure though, I could easily see loads of ways to make it really hard and complex to follow. I wonder if an eslint plugin could help keep things on the right path.

StreamBright · on Aug 20, 2020

You are assuming a lot here. Kubernetes is not nearly as popular and used as much to justify that we only need that interface. CloudFormation is replaced with Terraform, Ansible and Pulumi much more than it is replaced with Kubernetes.

hardwaresofton · on Aug 20, 2020

I am assuming a lot, very possibly super early but I do want to point out that it's terraform/pulumi, and separately ansible (unless you're talking about ansible for provisioning/cloud modules[0] which no one ever seems to know about).

But what I'm trying to get at is that the kubernetes interface basically is the terraform/pulumi interface, just with a different configuration language (and less built-in templating macros, which is a GOOD thing, see CloudFormation), but better, because it's continuous. You write a configuration (no matter how dynamic you make things, there's one version that's sent out to AWS/GCP/whatever) with terraform/pulumi, and terraform/pulumi make it true once. You write a relatively similar configuration with k8s and it is made to be true once, but also continuously.

As far as popularity goes, kubernetes is insanely popular -- kubernetes was adopted by just about every cloud provider with a speed I've never seen before. Terraform/Pulumi basically took on the work of adapting to cloud provider APIs, not the other way around.

[0]: https://docs.ansible.com/ansible/latest/modules/list_of_clou...

jacques_chester · on Aug 20, 2020

> You write a relatively similar configuration with k8s and it is made to be true once, but also continuously.

This is true if and only if there is a controller written to achieve reconciliation. Defining a CRD doesn't do anything, it's just a reservation of a path and, maybe, a schema attached.

A controller can be written that listens but does nothing. It can be written to do something, but not listen. The API Server does not care. There is no magic logic engine or pathfinder or query planner. Someone has to write a controller.

All that bare Kubernetes gives you for CRDs is the equivalent of MongoDB.

hardwaresofton · on Aug 21, 2020

You're right technically, but who writes a controller that does nothing? Also, just the existence of the CRD can be enough to signal state to other controllers, so there are some knock-on benefits to just having it there in storage.

Regardless of corner cases, this is a step change difference between tools like terraform/pulumi that reach out and call APIs and finish executing compared to k8s which supports reconciliation. A tool with reconciliation is strictly more capable than one without (for better or for worse).

bogomipz · on Aug 20, 2020

Well except that EKS uses CloudFormation on the backend during cluster provisioning process. So even if you use Terraform to provision an AWS EKS cluster it will ultimately result in rendering Cloudformation templates at AWS. The eksctl tool works the same way i.e it builds CloudFormation templates.

hardwaresofton · on Aug 20, 2020

If CloudFormation is an AWS-internal tool, it might as well not exist to those of us outside AWS. I couldn't care less what AWS uses to provision the machines I order with a different interface -- it's their birthday/funeral.

nuker · on Aug 20, 2020

> At the risk of being early, RIP CloudFormation.

At the risk of audience running out of pop corn, Cloudformation if the only thing you really need for Infra as a Code in AWS. Nothing else. ECS Cluster, ECS Service, TaskDefinition, ALB - it's all that's needed to orchestrate containers.

ponderingfish · on Aug 20, 2020

Orchestration tools are the way forward especially when it comes to on-demand video compression - it's helpful to have the tools to be able to spin up 100s of servers to handle peak loads and then go down to nothing. Kubernetes is so helpful in this.

Scaevolus · on Aug 20, 2020

Are you talking about on-demand compression for viewers, or on-demand for uploads? The economics of on-demand transcoding are usually pretty bad in the cloud.

jtsiskin · on Aug 20, 2020

Would AWS spot instances be useful here?

big-malloc · on Aug 20, 2020

Currently the cluster autoscaler supports using a pool of spot instances based on pricing, which is super helpful for test clusters, and there are some other tools available to ensure that you can evict your spot nodes when amazon needs them back

zxienin · on Aug 20, 2020

There is now secular push towards use of custom operators instead of OSB. I wonder what finally caused this.

jacques_chester · on Aug 20, 2020

A mix of factors, I think.

1. OSBAPI is not widely known outside of the Cloud Foundry community it came from. In turn that's because Cloud Foundry is not widely known either. Its backers never bothered to market Cloud Foundry or OSBAPI to a wider audience.

2. It imposes a relatively high barrier to entry for implementers. You need to fill in a lot of capabilities before your service can appear in a conformant marketplace. With CRDs you can have a prototype by lunchtime. It might be crappy and you will reinvent a whole bunch of wheels, but the first attempt is easy.

3. Fashion trends. The first appearance of OSBAPI in Kubernetes-land used API aggregation, which was supplanted by CRDs. Later implementations switched to CRDs but by then the ship was already sailing.

4. RDD. You get more points for writing your own freeform controller than for implementing a standard that isn't the latest, coolest, hippest thing.

It's very frustrating as an observer without any way to influence events. OSBAPI was an important attempt to save a great deal of pain. It provided a uniform model, so that improvements could be shared amongst all implementations, so that tools could work against standard interfaces in predictable ways, so that end-users had one and only one set of concepts, terms and tools to learn. It also made a crisp division between marketplaces, provisioning and binding.

What we have instead is a mess. Everyone writing their own things, their own way. No standards, no uniformity, different terms, different assumptions, different levels of functionality. No meaningful marketplace concept. Provisioning conflated with binding and vice versa.

It is a medium-sized disaster, diffuse but very real. And thanks to the marketing genius of enterprise vendors who never saw a short-term buck in broad spectrum developer awareness, it is basically an invisible disaster. What we're heading towards now is seen as normal and ordinary. And it drives me bonkers.

zxienin · on Aug 21, 2020

I’d agree on the mess. I also find it on over-engineered side. Do I really need service discovery of services that I already know of, from AWS GCP...?

jacques_chester · on Aug 21, 2020

If you want a little from column A and a little from column G, having a single interface is pretty helpful. It's easier to automate and manage.

wavesquid · on Aug 20, 2020

This is great!

Are other companies doing similar things? e.g. I would love to be able to set up Cloudflare Access for services in k8s

levi_b · on Aug 20, 2020

We just released an operator with similar goals, but it works across clouds. More details here: https://www.pulumi.com/blog/pulumi-kubernetes-operator/

algorithmmonkey · on Aug 20, 2020

Azure has an incubation project which is building a Golang CRD generator and a generic Azure resource controller [k8s-infra]. It will likely end up being used in Azure Service Operator [ASO].

[k8s-infra]: https://github.com/Azure/k8s-infra

[ASO]: https://github.com/Azure/azure-service-operator

sytringy05 · on Aug 20, 2020

GCP (Config Connector) and Azure (Service something) both have similar things. I've not heard of it happening outside a managed k8s env.

harpratap · on Aug 20, 2020

https://crossplane.io is doing a multi-cloud one

zxienin · on Aug 20, 2020

I like their work, but their OAM centricity is too heavy an opiniation.

bassamtabbara · on Aug 20, 2020

disclaimer: I'm a maintainer on Crossplane.

OAM is an optional feature of crossplane - you don't have to use it if you don't want to

zxienin · on Aug 21, 2020

Good to know, at least that warm me up to crossplane further. The messaging might need update, including within the docs. I mean - crossplane is the OAM implement - coupled with OAM sprinkled all over docs, gave me very different impression.

This aside, I think crossplane work is interesting.

Niksko · on Aug 20, 2020

The approach of generating the code from the existing Golang API bindings means that hopefully this project will get support for lots of resources pretty quickly.

Excited about this, though you do wonder whether it'll suffer the same fate as Cloudformation: the Cloudformation team finds out about new feature launches the same time that the general public does. If the Kubernetes operator lags behind, you're going to have to fall back to something else if you need cutting edge features.

moondev · on Aug 20, 2020

Seems odd there is no controller for EC2 or even planned on the roadmap https://github.com/aws/aws-controllers-k8s/projects/1

alexeldeib · on Aug 20, 2020

It's not weird at all. A prime use case for this is to use Kubernetes itself for the compute layer and orchestrating peripheral AWS components using Kubernetes as the common control plane.

You can orchestrate entire application stacks (pods, persistent storage, cloud resources as CRDs) using this approach.

harpratap · on Aug 20, 2020

There is a fairly decent demand for orchestrating VMs using kubernetes (kubervirt), many legacy apps are too expensive to be rewritten in a cloud native way

paxys · on Aug 20, 2020

There is no point in including services that are already directly mapped to a Kubernetes resource. EC2 instances are nodes, EBS/EFS can be mounted as volumes etc. This project adds Custom Resource Definitions for the rest.

What I really want is RDS/Aurora though.

moondev · on Aug 21, 2020

Not all ec2 instances are k8s nodes though. My thought is using this to manage all ec2 instances via k8s. Instead of the aws console

coryodaniel · on Aug 20, 2020

This is nice from the app manifest perspective because you can declare your database right along side your deployment.

The provisioning time of a deployment and an RDS instance is very different though. This is probably most useful when you’re starting a service up for the first time. This is also when it’s not going to work as expected due to that latency of RDS starting up while your app crashed repeatedly waiting for that connection string.

This would be really nice for buckets and near instant provisioned resources, but also kinda scary that someone could nuke a whole data store because they got a trigger finger with a weird deployment and deleted and reapplied it.

My feelings, they are mixed. :D

MichaelMoser123 · on Aug 21, 2020

Kubernetes is supposed to be cloud vendor agnostic; the cloud vendors counter that by having extension operators to create some tie in to the kubernetes deployment of their making.

I guess the 'kubernetes' way would be to create a generalized object for 'object store', that would be implemented by means of s3 on aws and on azure it would be done as blob storage.

Now with this approach you can only do the common features between all platforms, you would have a problem with features exclusive to aws for instance, or you would need some mapping from a generalized CRD object to specific implementation of each platform.

1-KB-OK · on Aug 20, 2020

Interesting how they enforce the namespaced scope for ACK custom resources. This is a logical design choice but makes it trickier for operators to use.

Say I have an operator watching all namespaces on the cluster. Since operator CRDs are global in scope it makes sense for some operators to be installed as singletons. A CR for this operator gets created in some namespace, and it wants to talk to s3 -- it has to bring along its own s3 credentials and only that CR is allowed to use the s3 bucket? You can imagine a scenario where multiple CRs across namespaces want access to the same s3 bucket.

la6471 · on Aug 20, 2020

Everything from DNS to AWS SDKs gets reinvented in Kubernetes. It is the most anal approach of Infrastructure design I have ever seen in the last three decades. A good design builds on the things that are already there and does not goes around trying to change every well established protocol in the world.KISS.

sytse · on Aug 20, 2020

This feels like Crossplane.io but limited to only AWS. Kelsey seems to think the same https://twitter.com/kelseyhightower/status/12963213771342315...

toumorokoshi · on Aug 20, 2020

This has been posted a couple times, but GCP has an equivalent that's been around for a while:

https://cloud.google.com/config-connector/docs/overview

disclaimer: I work at GCP.

sunilkumarc · on Aug 20, 2020

Wow. Now we can directly manage AWS services from Kubernetes.

Github: https://aws.github.io/aws-controllers-k8s/

On a different note, recently I was looking to learn AWS concepts through online courses. After so much of research I finally found this e-book on Gumroad which is written by Daniel Vassallo who has worked in AWS team for 10+ years. I found this e-book very helpful as a beginner.

This book covers most of the topics that you need to learn to get started:

If someone is interested, here is the link :)

https://gumroad.com/a/238777459/MsVlG