Nomad vs. Kubernetes

jeswin · on Oct 15, 2021

Despite its reputation, Kubernetes is actually quite easy to master for simple use cases. And affordable enough for more complex ones.

The fundamental abstractions are as simple as they can be, representing concepts that you'd already be familiar with in a datacenter environment. A cluster has nodes (machines), and you can run multiple pods (which is the smallest deployable unit on the cluster) on each node. A pod runs various types of workloads such as web services, daemons, jobs and recurring jobs which are made available (to the cluster) as docker/container images. You can attach various types of storage to pods, front your services with load-balancers etc.

All of the nouns in the previous paragraph are available as building blocks in Kubernetes. You build your complex system declaratively from these simpler parts.

When I look at Nomad's documentation, I see complexity. It mixes these basic blocks with those from HashiCorp's product suite.

websap · on Oct 15, 2021

I couldn't disagree with this more. I manage and run Kubernetes clusters as part of my job, and I can tell you that configuring, running and installing these clusters is no small feat. I don't know much about Nomad, but I would highly encourage most users to not think K8S is simple by any standard. Multiple cloud providers now provide ways to run your code directly, or your containers directly. Unless you have a reason to manage that complexity, save yourself some heartburn and make your developers happy that they just need to worry about running their code and not what happens when a seemingly innocuous change might cause a cascading failure.

This has some great account of war stories - https://k8s.af/

llama052 · on Oct 15, 2021

Honestly if you use a managed kubernetes provider it's pretty simple once you nail the concepts. Though you'll get hit with cloud provider issues every now and then but it's really not terrible.

I'd manage services in Kubernetes before I start deploying VMs again, that's for sure.

zie · on Oct 15, 2021

Sure, you pay someone else to keep k8s alive, it's not so bad, but it's expensive to do that. You generally need a full-time team of people to keep a k8s deployment alive if you are running it yourself.

I keep Nomad alive part-time. It's operationally simple and easy to wrap ones head around it and understand how it works.

llama052 · on Oct 15, 2021

A lot of the cloud providers don't bill directly for kubernetes management, instead it's just the node resources.

Either way, as another comment points out, Rancher and many other solutions make the orchestration of creating your own Kubernetes cluster really boring.

We run a few kubernetes clusters on premise, and for the longest time it was just 1 person running some Kubernetes clusters.. we even have other teams in QA/Engineering running their own with it.

sciurus · on Oct 15, 2021

Which cloud providers don't? I'm only familiar with AWS and GCP, and they both have a base hourly charge per-cluster.

llama052 · on Oct 16, 2021

Azure as far as I know doesn't have base charges unless you want a paid SLA, It's Azure though.

coverj · on Oct 15, 2021

DigitalOcean and Linode that I'm aware of, may be others

zie · on Oct 21, 2021

DigitalOcean charges extra for k8s nodes(compared to VPS), So does Linode last I bothered to check.

neop1x · on Oct 19, 2021

Scaleway

k8sToGo · on Oct 15, 2021

You can also use something like Rancher or k3s to keep it alive part-time.

InvertedRhodium · on Oct 15, 2021

RKE2 is the closest thing we've found to a turnkey, on prem, production ready Kubernetes distribution. Though we were last looking about a year ago so I concede there might be better or comparable options now.

je42 · on Oct 15, 2021

nope. have a GKE cluster running "unattended" for months now. looks fine. ;)

zie · on Oct 21, 2021

GKE is the industry's first fully managed Kubernetes service...

I.e. you don't run k8s, Google does it for you. I'm talking about where YOU run k8s, not run ON k8s. I agree running ON k8s is pretty easy.

ascendantlogic · on Oct 16, 2021

Almost everything is pretty simple "once you nail the concepts". It's getting to the point where you have the concepts nailed that is the measuring stick for complexity.

mfer · on Oct 15, 2021

> Despite its reputation, Kubernetes is actually quite easy to master for simple use cases.

This has so many assumptions rolled up in it. When we move from niche assumptions to the masses of app operators and cluster operators (two different roles) it tends to fall apart.

Which ultimately leads to things like...

> This has some great account of war stories - https://k8s.af/

We shouldn't assume other people are like us. It's worth understanding who folks are. Who is that ops person who wants to run Kubernetes at that insurance company or bank? Who is the person setting up apps to run for that auto company? What do they know, need, and expect?

When I answer these questions I find someone far different from myself and often someone who finds Kubernetes to be complicated.

shaklee3 · on Oct 16, 2021

I also maintained several kubernetes clusters and never found it very difficult. We never had any major outages, and the only downtime was trying new features.

websap · on Oct 16, 2021

What's the scale of your clusters? If you're running anywhere between 5 - 20 nodes with a 10 pods on each node. I think that's a very small deployment and you'll be able to breeze by with most of the defaults. You still need to configure the following though - logs, certificates, authentication, authorization, os upgrades, kubernetes upgrades, etc.

I'm not sure if all of this is worth it, if you're running a small footprint. You're better of using more managed solutions available today.

shaklee3 · on Oct 16, 2021

5 clusters of around 100 nodes each on average

ldehaan · on Oct 16, 2021

I dunno man, k8s is pretty simple, but only if you make it so, I have built (early on) complicated setups with all kinds of functionality, but the complexity was rarely used. I now manage clusters and deployments on those clusters with code, terraform to be exact (tcl expect for instances when terraform can't hang yet) I built a gui interface for terraform to manage deployments of kubernetes on various clouds and bare metal and to manage the deployments To kubernetes. \n Maybe one day this will get too complex and we'll do something else but so far so good.

andrewstuart2 · on Oct 15, 2021

From a system architecture perspective, kubernetes is very complex since it handles a multitude of complexities, but that's why it _can_ be very simple from a user's perspective. Most of the complexity can be ignored (until you want it).

It's the same reason I still like to use postgres when I can versus NoSQL until I know I need the one feature I may not be able to achieve with postgres: automatic sharding for massive global scale. The rest of the features postgres (and friends) give easily (ACID etc) are very tricky to get right in a distributed system.

It's also the same reason bash is great for tasks and small automated programs, but kind of terrible for more complex needs. The primitives are super simple, easy to grok, and crazy productive at that scale, but other languages give tools to more easily handle the complexities that start to emerge when things get more complicated.

spookthesunset · on Oct 15, 2021

> From a system architecture perspective, kubernetes is very complex since it handles a multitude of complexities, but that's why it _can_ be very simple from a user's perspective.

I’ve been doing software for 20 years in one form or another. One of the things I’ve learned is the simpler and more polished something seems to the user, it is almost always because there is a hell of a lot of complexity under the covers to make it that way.

Making something that handles the 80% is easy. Every step closer to 100% becomes non-linear in a hurry. All that polish and ease of use took months of “fit & finish” timeboxes. It took tons of input from UX and product. It involved making very hard decisions so you, the user, don’t have to.

A good example is TurboTax online. People love to hate on their business practices (for good cause) but their UX handles like 98% of your common tax scenarios in an incredibly easy to use, highly polished way. Robinhood does a pretty good job too, in my opinion—there is a lot of polish in that app that abstracts away some pretty complex stuff.

specialp · on Oct 15, 2021

It doesn't take long working with Nomad to hit the cases where you need to augment it. Now I know some of people enjoy being able to plug and play the various layers that you get in the complicated kitchen sink Kubernetes.

We already had something that just ran containers and that was Mesos. They had the opinion that all the stuff like service discovery could be implemented and handled by other services like Marathon. But it did not work well. Most people that was to deploy containers in an orchestrated manner want service discovery.

At least these parts are being provided by the same company (Hashicorp) so it probably won't suffer the same lack of coordination between separate projects that Mesos did.

The benefit to the kitchen sink opinionated framework that K8s does is that your deployments and descriptors are not very difficult to understand and can be shared widely. I do not think the comparison of massively sharded NoSQL to Postgres is the same because most people will not need massive sharding, but almost everyone is going to need the service discovery and other things like secrets management that K8s provides.

wg0 · on Oct 15, 2021

One of the things that I don't like about Nomad is HCL. It is a language that is mainly limited to HashiCorp tools and there's no wider adoption outside at least not to my knowledge.

From the documentation:

> Nomad HCL is parsed in the command line and sent to Nomad in JSON format via the HTTP API.

So why not just JSON or even JSON at all and not MsgPack or just straight up HCL because that's over and over introduced as being machine readable and human friendly both at the same time?

rustyminnow · on Oct 15, 2021

I've only used Terraform, but I absolutely love HCL as a configuration language. I know I'm in the minority about this, but it's so much less fiddly and easier to read than json or yaml. I do wish there were more things that used it.

JSON is fine for serialization, but I hate typing it out. There are too many quotes and commas - all keys and values have to be quoted. The last item in a list or object can't have a trailing comma, which makes re-ordering items a pain. Comments aren't supported (IMO the biggest issue).

YAML is too whitespace dependent. This is fiddly and makes copy-pasting a pain. I'm also not sure how it affects serialization, like if you want to send yaml over the network, do you also have to send all of the whitespace? That sounds like a pain. OTOH I like that quotes are usually optional and that all valid json is also valid YAML. Comments are a godsend.

HCL has the same basic structure of objects and lists, but it's much more user friendly. Keys don't need to be quoted. Commas aren't usually needed unless you compress items onto the same line. It supports functions, list/object comprehensions, interpolations, and variable references which all lead to more powerful and DRY configuration. Granted I'm not sure if these are specific to TF's HCL implementation, but I hope not.

For serialization, HCL doesn't have any advantage over JSON. Sure it's machine-readable but probably much harder to write code that works on HCL specifically than to convert to JSON and use one of the zillions of JSON libraries out there.

gizdan · on Oct 15, 2021

JSON was designed for machine readability, HCL was designed for human readability.

HCL requires a lot more code to parse and many more resources to keep in memory vs JSON. I think it completely makes sense to do it this way. K8s is the same. On the server it does everything in JSON. Your YAML gets converted prior to sending to K8s.

still_grokking · on Oct 16, 2021

Parsing JSON Is a Minefield (2016):

https://news.ycombinator.com/item?id=28826600

AzzieElbab · on Oct 15, 2021

I don’t think Json was designed, it is just JavaScript objects plus Douglas Crockford spec. Having said that, HCL really doesn’t click with me

rbonvall · on Oct 15, 2021

Crockford himself says JSON was not invented but rather discovered.

AzzieElbab · on Oct 15, 2021

Didn’t know he did but it is somewhat obvious. Lispish roots of js do shine through sometimes

loopz · on Oct 16, 2021

Syntax and data structures are very similar to LPC serialization. Obvious but very useful.

https://en.m.wikipedia.org/wiki/LPMud

orthecreedence · on Oct 15, 2021

To my understanding, you can write most (all?) of the job/config files in JSON if you wish. At my company, we have a ton of HCL files because in the beginning it was easier to hand-write them that way, but we're now getting to the point where we're going to be templating them and going to switch to JSON. In other words, I believe HCL is optional.

softveda · on Oct 16, 2021

OctopusDeploy selected HCL for its pipeline config as code.

https://octopus.com/blog/state-of-config-file-formats

p_l · on Oct 15, 2021

The important difference with k8s from my experience is that from the very early days it modeled a common IPC for any future interfaces, even if TPR/CRD tools some time to hash out. This means that any extra service can be simply added and then used with same general approach as all other resources.

This means you get to build upon what you already have, instead of doing everything from scratch again because your new infrastructure-layer service needs to integrate with 5 different APIs that have slightly different structure.

xtracto · on Oct 15, 2021

> The rest of the features postgres (and friends) give easily (ACID etc) are very tricky to get right in a distributed system.

But that's just basically a calculated tradeoff of Postgres (and several CP databases) trading Availability for Consistency.

kbenson · on Oct 15, 2021

Probably less calculated and more "that's what's available to offer stably right now that we can feasibly deliver, so that's what we'll do." Distributed RDBMS were not exactly cheap or common in open source a couple decades back. I don't think there was much of a choice to make.

spookthesunset · on Oct 15, 2021

I mean it is a trade off though. You cannot beat the speed of light. The further apart your database servers are, the more lag you get between them.

If you want a transactional, consistent datastore you are gonna have to put a lock on something while writes happen. And if you want consistency it means those locks need to be on all systems in the cluster. And the entire cluster needs to hold that lock until the transaction ends. If your DB’s are 100ms apart… that is a pretty large, non negotiable overhead on all transactions.

If you toss out being fully consistent as a requirement, things get much easier in replication-land. In that case you just fucking write locally and let that change propagate out. The complexity then becomes sorting out what happens when writes to the same record on different nodes conflict… but that is a solvable problem. There will be trade offs in the solution, but it isn’t going against the laws of physics.

rstuart4133 · on Oct 17, 2021

> If you want a transactional, consistent datastore you are gonna have to put a lock on something while writes happen. And if you want consistency it means those locks need to be on all systems in the cluster.

FWIW, it's not as bad as that sounds. There are traditional locks, and there is optimistic locking. If a there are two conflicting transactions, a traditional lock detects this before it happens (by insisting a lock is obtained before any updates are done) and if there is any chance of conflict the updates are run serially (meaning one is stopped while the other runs).

Optimistic locks let updates run with lock or blocking at all, but then at the end they check if the data they depended on (ie, data that would have been locked by the traditional mechanism) has changed. If it has they throw it all away. (Well, perhaps not quite - they may apply one of the conflicting updates to ensure forward progress is made.) The upside of this is there is if there are no conflicting updates everything runs at full speed - because there is no expensive communication about why has what lock going. The downside is a lot of work may be thrown away by what amounts to speculative execution.

Most monolithic databases use traditional locking. Two CPU's in the same data centre (or more likely on the same board) can rapidly decide who owns what lock, but cycles and I/O on a high end server are precious. Distributed ACID databases like spanner, cockroachdb and yugabytedb favour opportunistic because sending messages half way across the planet to decide who owns what lock before allowing things to proceed takes a lot of time, whereas the CPU cycles and I/O's on the low end replicated hardware are cheap.

While opportunistic locks allow an almost unlimited number of non-conflicting updates to happen concurrently, their clients still have to pay a time penalty. The decision about whether there was a conflicting update still has to be made, and it still requires packets to cross the planet, and while all this happens the client can't be sure if their data has been committed. But unlike the traditional model, they are never blocked by what any other client is doing - providing it doesn't conflict.

kbenson · on Oct 15, 2021

Yes, but my point was there wasn't really a choice to make at that time, therefore no trade off.

Even if I won $100 in the lotto today and had the money in hand, I wouldn't describe my choice which house I bought years ago as a calculated trade off between what I bought and some $10 million dollar mansion. That wasn't a feasible choice at that time. Neither was making a distributed RDBMS as an open source project decades ago, IMO.

lazide · on Oct 15, 2021

Wasn’t MySQL (pre-Oracle) an open source distributed RDBMS decades ago? At least I remember running it using replication in early 2000’s

kbenson · on Oct 15, 2021

MySQL replication isn't really what I would consider a distributed RDBMS in the sense we're talking about, but it is in some senses. The main distinction being that you can't actually use it as a regular SQL interface. You have to have a primary/secondary and a secondary can't accept writes (if you did dual primary you had to be very careful about updates), etc. Mainly that you had to put rules and procedures in place for how it was used in your particular environment to allow for sharding or multiple masters, etc, because the underlying SQL system wasn't deterministic otherwise (also, the only replication available then was statement based replication, IIRC).

More closely matching would be MySQL's NDB clustered storage engine, which was released in late 2004.[1] Given that Postgres and MySQL both started ab out 1996, that's quite a time after initial release.

I spent a while in the early to mid 2000's researching and implementing dual master/slave or DRBD backed MySQL HA systems as a consultant, and the options available were very limited from what I remember. There's also probably a lot better tooling these days for developers to make use of separated read/write environments, whereas is seemed fairly limited back then.

1: https://en.wikipedia.org/wiki/MySQL_Cluster

brightball · on Oct 15, 2021

At the moment you need massive sharding options with Postgres, you've got a lot of options. I'd assume by the time you get there, you can probably budget for them pretty easily as well.

Citus is specifically designed for it.

Timescale appears to have the use case covered too.

Ultimately though, sharding is one of the easier ways to scale a database out. NoSQL just does it by eliminating joins. You can do that pretty easily with almost any solution, PG or otherwise.

cfigueir · on Oct 15, 2021

The comparison is a bit misleading. As someone that has used both nomad and k8s at scale --

- Nomad is a scheduler. Clean and focused. It is very fast. I was an early user and encountered a number of bumps, but that's software. The people at Hashicorp are super sharp and lovely.

- K8s is a lot more. It includes a scheduler, but in the simplest sense, it is a complete control-plane based on the control-loop pattern. You have an API, a scheduler, a db, various controllers, etc. Forget for a moment that most people use it to orchestrate containers -- it's really designed to orchestrate anything. Its API is extensible, you can add and compose controllers -- there are many possibilities once you wrap your head around it.

This is all opinionated and includes a lot of capability. It's just very different.

You can stitch together nomad, consul, vault, and various glue to create a container orchestration system... but when you start wanting to manage the control-planes as though they are the "kind" (the container for example) with meta-control-planes, and you start wanting to orchestrate network, storage, and other dependencies... all while doing this in a multi-tenant environment, then things get interesting.

-charles.

still_grokking · on Oct 16, 2021

I'm not sure it's a good idea to use Google's "monolithic cloud operation system".

Such a monoculture has the same issues as MS Windows had, even for the same reasons.

The "Unix Way" of simple tools interacting seems more reasonable. Especially when it comes to lock-in effects.

m00x · on Oct 16, 2021

K8s itself is divided in multiple parts, where you can customize to your own liking, and you can swap parts if you'd like as long as the APIs are similar.

It's very much built the UNIX way.

still_grokking · on Oct 16, 2021

Where can I find those alternative elements that can be swapped? If this is true there should be a lot of them, right?

narism · on Oct 21, 2021

If your complaint about Kubernetes is that it doesn’t provide you enough choice/extensibility, you’re probably not looking hard enough.

Runtimes (CRI) - Docker shim (deprecated), containerd, CRI-O, kata

Networking (CNI) - Flannel, Calico, Cilium, cloud specific ones, many more

Storage (CSI) - way too many device plugins - GPU, TPU, RDMA/SRIOV NICs

Data store - etcd, dqlite in microk8s, SQLite/postgresql/mysql in k3s

DNS - kube-dns (somewhat deprecated), CoreDNS

Ingress - NGINX (multiple), HAProxy, Envoy (many), etc

Kubelet -https://github.com/virtual-kubelet/virtual-kubelet

Kube-proxy - Cilium can act as a replacement

Cloud-controller-manager for each cloud provider

kube-scheduler - https://kubernetes.io/docs/tasks/extend-kubernetes/configure...

kube-apiserver and kube-controller-manager are two parts where I’m not aware of any other implementations but a) they are kind of the heart of k8s and b) can be easily extended with CRDs/operators.

nonameiguess · on Oct 16, 2021

This is so far from the truth, I'm having a hard time imagining why you even think this. Kubernetes is practically the distributed embodiment of the Unix philosophy. You have a core set of interfaces and components that need to offer a particular API, and other than that, whether it is one program or many, written by one developer or hundreds, by a private company or via volunteers contributing to open source projects, is totally up to how you want to do it. You're free to use the original reference implementation that used to be owned by Google a decade ago before they open sourced it and donated it to the CNCF, but you certainly don't have to. Others have mentioned k3s, which is the busybox to the reference kubernetes GNU coreutils, all Kubernetes, plus ingress and network overlay, in a single binary, with am embedded sqlite db as the backing store instead of etcd. But k3s is still "Kubernetes." Kubernetes is a standard, much like POSIX. It's maybe unfortunate that the original reference implementation is also also named "kubernetes" because a lot of people seem to think that one is the only one you can use, and it has historically been complex to set up, but the reason for the complexity is it doesn't make any choices for you.

Imagine if you wanted to use a Unix operating system, but instead of choosing a Linux distro, you just read the POSIX standard and went out and found every required utility, plus a kernel, and had to figure out on your own how to get those to work together and create a system that can run application-level software. If you just go to kubernetes.io and follow the instructions on how to get up and running with the reference implementation, that is what you're doing. It makes no decisions at all for you. You can run external etcd, or use kubeadm to set it up for you. You can run it HA or on a single node. You can add whatever overlay network you want. You can use whatever container runtime engine you want. You can use whatever ingress controller you want, or none at all, and not have any external networking, just as you can install Linux From Scratch and not even bother to include networking if you want a disconnected system for some reason.

You have pretty much complete user freedom, and that is, in fact, the source and reason for a whole lot of complaints. Application developers and even most system administrators don't want to have to make that many decisions before they can even get to hello world. I believe Kelsey Hightower commented on this a while back, saying something to the effect that Kubernetes is not meant to be a developer platform. It's a framework for creating platforms.

Application developers, startups, and small business should almost never be using Kubernetes directly unless they're actually developing a platform product. Whether you use a "distro" like RKE2 or k3s or a managed service from a cloud provider, building out your own cluster using the reference kubernetes is the modern day equivalent of deploying a LAMP stack but doing it on top of Linux From Scratch.

jeffnappi · on Oct 15, 2021

> Despite its reputation, Kubernetes is actually quite easy to master for simple use cases. And affordable enough for more complex ones.

Are you referring to actually spinning up and operating your own clusters here or utilizing managed k8s (e.g. GKE/EKS)?

In my understanding, sure - using the managed services and deploying simple use cases on it might not be that big of a deal, but running and maintaining your own k8s cluster is likely far more of a challenge [1] than Nomad as I understand it.

[1] https://github.com/kelseyhightower/kubernetes-the-hard-way

zaat · on Oct 15, 2021

Kubernetes the hard way is an educational resource, meant to server as a resource for those interested in deep diving into the platform, similar to Linux from scratch. Like the statement on the label:

> Kubernetes The Hard Way is optimized for learning, which means taking the long route to ensure you understand each task required to bootstrap a Kubernetes cluster.

> The results of this tutorial should not be viewed as production ready, and may receive limited support from the community, but don't let that stop you from learning!

If you want to spin up cluster that you actually want to use you'll pick one of the many available free or paid distros, and spinning up something like k3s with rancher, microk8s or even the pretty vanilla option of kubeadmin is pretty simple.

candiddevmike · on Oct 15, 2021

Agreed. With k3s and friends, nomad is more complicated and requires other components (Consul, some kind of ingress) to match what you get out of the box.

richwater · on Oct 15, 2021

Running a production ready, high availability Kubernetes cluster with proper authn, authz, and resource controls is about the farthest thing from "simple" that I can imagine.

HeavyStorm · on Oct 16, 2021

Profoundly disagree. In fact, your statements are not true, since before you can run a pod you will most probably need to create a deployment. And then you'll need a service to expose your workload.

Anything but the most trivial workloads will also lead you into questions of how to mount volumes, cijfure) use configmaps and secrets, etc.

And that's not even touching the cluster configuration, which you can skip over if you are using a cloud provider that can provision it for you.

jbluepolarbear · on Oct 15, 2021

It’s wouldn't have that reputation if it was easy. There’s too many things that can go wrong for it to be easy.

mcdevilkiller · on Oct 16, 2021

Definitely the contrary in my experience

throwaway923840 · on Oct 15, 2021

They're both complex. But one of them has 10 times the components than the other, and requires you to use them. One of them is very difficult to install - so much so that there are a dozen different projects intended just to get it running. While the other is a single binary. And while one of them is built around containers (and all of the complexity that comes with interacting with them / between them), the other one doesn't have to use containers at all.

gizdan · on Oct 15, 2021

> But one of them has 10 times the components than the other

I've said this before. Kubernetes gives you a lot more too. For example in Nomad you don't have secrets management, so you need to set up Vault. Both Nomad and Vault need Consul for Enterprise set ups, of which Vault needs 2 Consul clusters for Enterprise setups. So now you have 3 separate Consul clusters, a Vault cluster, and a Nomad cluster. So what did you gain really?

otterley · on Oct 15, 2021

Kubernetes' secrets management is nominal at best. It's basically just another data type that has K8S' standard ACL management around it. With K8S, the cluster admin has access to everything, including secrets objects. It's not encrypted at rest by default, and putting all the eggs in one basket (namely, etcd) means they're mixed in with all other control plane data. Most security practitioners believe secrets should be stored in a separate system, encrypted at rest, with strong auditing, authorization, and authentication mechanisms.

alexeldeib · on Oct 15, 2021

It's "good enough" for most and extension points allow for filling the gaps.

This also dodges the crux of GP's argument -- instead of running 1 cluster with 10 components, you now need a half dozen clusters with 1 component each, but oops they all need to talk to each other with all the same fun TLS/authn/authz setup as k8s components.

otterley · on Oct 15, 2021

I'm a little confused. Why does the problem with K8S secrets necessitate having multiple clusters? One could take advantage of a more secure secrets system instead, such as Hashicorp Vault or AWS Secrets Manager.

alexeldeib · on Oct 15, 2021

The point is that once you're talking about comparable setups, you need all of Vault/Nomad/Consul and the complexity of the setup is much more than just "one binary" as hashi likes to put it.

> So now you have 3 separate Consul clusters, a Vault cluster, and a Nomad cluster. So what did you gain really?

GP's point was already talking about running Vault clusters, not sure you realized we aren't only talking about nomad.

otterley · on Oct 15, 2021

The only thing I was trying to say is that although K8S offers secrets "for free," it's not best practice to consider the control plane to be a secure secrets store.

sofixa · on Oct 15, 2021

That's false. Vault has integrated storage and no longer needs Consul.

If you want to have the Enterprise versions( which aren't required), you just need 1 each of Nomad, Consul, Vault. Considering many people use Vault with Kubernetes anyway(due to the joke that is Kubernetes "secrets"), and Consul provides some very nice features and is quite popular itself, that's okay IMHO. Unix philosophy and all.

gizdan · on Oct 15, 2021

This is just false. I've run Vault in an Enterprise and unless something has changed in the last 12 months, Hashicorp's recommendation for Vaul has been 1 Consul cluster for Vault's data store, and 1 for it's (and other application's) service discovery.

Sure Kubernetes's secrets is a joke by default, it's easily substituted by something that one actually considers a secret store.

alexeldeib · on Oct 15, 2021

https://www.vaultproject.io/docs/configuration/storage/raft

It's new but I think is quickly becoming preferred. I found trying to setup nomad/consul/vault as described on the hashi docs creates some circular dependencies tbh (e.g. the steps to setup nomad reference a consul setup, the steps for vault mention nomad integration, but there's no clear path outside the dev server examples of getting there without reading ALL the docs/references). There's little good docs in the way of bootstrapping everything 1 shot from scratch in the way most Kubernetes bootstrapping tools do.

Setting up an HA Vault/Consul/Nomad setup from scratch isn't crazy, but I'd say it's comparable level to bootstrapping k8s in many ways.

gizdan · on Oct 15, 2021

Cool, so that's certainly new. But even then, you're dealing with the Raft protocol. The different is it's built into Nomad compared to Kubernetes where it's a separate service. I just don't see Nomad and Co being that much easier to run, if at all.

I think Nomad's biggest selling point is that it can run more than just containers. I'm still not convince that's it's much better. At best it's equal.

alexeldeib · on Oct 15, 2021

> you're dealing with the Raft protocol. The different is it's built into Nomad compared to Kubernetes where it's a separate service

I don't really follow this. etcd uses raft for consensus, yes, and it's built in. Kubernetes components don't use raft across independent services. Etcd is the only component that requires consensus through raft. In hashi stack, vault and nomad (at least) both require consensus through raft. So the effect is much bigger in that sense.

> I think Nomad's biggest selling point is that it can run more than just containers. I'm still not convince that's it's much better. At best it's equal.

Totally agree. The driver model was very forward looking compared to k8s. CRDs help, but it's putting a square peg in a round hole when you want to swap out Pods/containers.

sofixa · on Oct 15, 2021

It's not that circular - you start with Consul, add Vault and then Nomad, clustering them through Consul and configuring Nomad to use Vault and Consul for secrets and KV/SD respectively. And of course it can be done incrementally ( you can deploy Nomad without pointing it to Consul or Vault, and just adding that configuration later).

alexeldeib · on Oct 15, 2021

I don't mean a literal circular dependency. I mean the documentation doesn't clearly articulate how to get to having all 3x in a production ready configuration without bouncing around and piecing it together yourself.

For example, you mention starting with consul. But here's a doc on using Vault to bootstrap the Consul CA and server certificates: https://learn.hashicorp.com/tutorials/consul/vault-pki-consu...

So I need vault first. Which, oops, the recommended storage until recently for that was Consul. So you need to decide how you're going to bootstrap.

Vault's integrated Raft storage makes this a lot nicer, because you can start there and bootstrap Consul and Nomad after, and rely on Vault for production secret management, if you desire.

mdekkers · on Oct 16, 2021

> This is just false.

No it isn’t.

> I've run Vault in an Enterprise

At this point I am starting to doubt that claim.

sofixa · on Oct 15, 2021

It has been longer than 12 months that Vault has had integrated storage.

historynops · on Oct 15, 2021

Kubernetes native secrets management is not very good, so you're going to end up using Vault either way.

raffraffraff · on Oct 15, 2021

Also, Kubernetes can be just a single binary if you use k0s or k3s. And if you don't want to run it yourself you can use a managed k8s from AWS, Google, Digital Ocean, Oracle...

mdekkers · on Oct 16, 2021

> Both Nomad and Vault need Consul for Enterprise set ups, of which Vault needs 2 Consul clusters for Enterprise setups. So now you have 3 separate Consul clusters, a Vault cluster, and a Nomad cluster.

This is incorrect. You don’t need consul for enterprise. Vault doesn’t need two consul clusters (it doesn’t need consul at all, if you don’t want it)

AlexCoventry · on Oct 16, 2021

That surprises me. Does Google have a more complete secrets-management system for its in-house services?

rixed · on Oct 16, 2021

IIUC, despite K8s having started at Google by Go enthusiasts who had good knowledge of borg, the goal has never been to write a borg clone, even less a replacement for borg.

And after so many years of independent development, I see no reason to believe that K8s ressemble borg any more than superficially.

This seems to be very much assumed by kubernetes authors. Current borg users please correct me if I'm wrong.

AlexCoventry · on Oct 16, 2021

Thanks.

OOPMan · on Oct 16, 2021

You gained the suffering of dealing with split-brains in Consul and Vault ;-)

shaklee3 · on Oct 16, 2021

Kubernetes has been a single binary with hyperkube for over 5 years. This argument is really tiring.

dividedbyzero · on Oct 15, 2021

Which is which?

steeleduncan · on Oct 15, 2021

I believe that the one that requires containers is Kubernetes. Nomad doesn't require containers, it has a number of execution backends, some of which are container engines, some of which aren't.

Nomad is the single binary one, however this is a little disingenuous as Nomad alone has far fewer features than Kubernetes. You would need to install Nomad+Consul+Vault to match the featureset of Kubernetes, at which point there is less of a difference. Notwithstanding that, Kubernetes is very much harder to install on bare metal than Nomad, and realistically almost everyone without a dedicated operations team using Kubernetes does so via a managed Kubernetes service from a cloud provider.

jaaron · on Oct 15, 2021

From parent's comment:

k8s = 10x the components & difficult to install.

Nomad = single binary, works with but doesn't require containers.

wg0 · on Oct 15, 2021

k0s is a single binary.

marvinblum · on Oct 15, 2021

I've been running a production-grade Nomad cluster on Hetzner for the past 1 1/2 years and it's fantastic. It was amazingly easy to set up compared to Kubernetes (which I also did), the UI is awesome, updates haven't broken anything yet (as long as you follow the changelog) and it's stable. I really like the separation of concerns the HashiStack offers. You can start out just using Consul for your service meshing, and then add Nomad + Vault to get a similar experience to Kubernetes.

Yes, it doesn't cover as many features as Kubernetes, but it should be good enough for most software and you can still make the switch later. I would never go back.

You can read more on our blog if you're interested: https://pirsch.io/blog/techstack/

fer · on Oct 15, 2021

Hi, I see you mention the tiniest nodes in Hetzner there, whereas the Nomad documentation [0] talks about 3-5 server nodes in the 2-digit GiB memory range, which is what has kept me from trying Nomad as I find it insane. How much truth is there in the docs?

[0] https://www.nomadproject.io/docs/install/production/requirem...

tetha · on Oct 15, 2021

This very much depends on your workload, number of jobs and complexity of scheduling. Our Nomad servers have 4GB of memory in the VM and are using about 0.5 - 1G at a low three-digit number of jobs.

Hashicorp is doing a smart but normal thing for on-prem sizing there - they are recommending specs which ensure you have no problems for a lot of workload sizes. And "workload" can grow very large there, since a single DC can handle up to 5k clients and such.

steeleduncan · on Oct 15, 2021

I think the minimum number of nodes is high because they are recommending the minimum requirements for a fault tolerant setup. It is entirely possible to install either k8s or nomad on a single node, but clearly that is not a fault tolerant situation.

IIRC both k8s and nomad rely on the Raft algorithm for consensus, and so both of them inherit the requirement of a minimum of 3 nodes from that algorithm.

If you want something to run some containers in the cloud, and you aren't concerned by the occasional failure, then there is no issue running a single Nomad (or k8s) on a single machine and later adding more if and when you require it.

tetha · on Oct 15, 2021

Yes, I did not even consider that angle.

And yes, nomad recommends 3 - 9 nodes in prod, maybe 11 or 13. The 3 are necessary to be able to tolerate one broken/maintained node and maintain the ability to schedule workloads.

You can increase the number of tolerated nodes by increasing the number of nodes - 5 tolerate 2, 9 tolerate 3, 11 tolerate 5, 13 tolerate 6, but raft becomes slower with more nodes, because raft is a synchronous protocol.

However, it is entirely possible to run a single node nomad system, if the downtime in scheduling is fine.

politician · on Oct 15, 2021

PSA: In practice, choose 3 or 5 nodes. Going higher exposes system to increased communications overhead. See also: all of the literature on this.

chucky_z · on Oct 15, 2021

I am currently running over 13000 nodes with a 5 server cluster. These are large boxes (90gb memory, 64 cores) but cpu usage is under 20% and memory usage is under 30%.

martijn9612 · on Oct 15, 2021

I'm sorry, what is the exact benefit of running multiple nodes on one piece of hardware? Just software failure resilience?

alexeldeib · on Oct 15, 2021

I assume they meant a 5 server control plane supporting 13k worker nodes, not that they partitioned 5 large hosts into 13k smaller ones. It's a counterpoint to GP's "raft gets slow with more servers", I think.

chucky_z · on Oct 15, 2021

This is correct.

We run a Raft cluster of 5 voting members with 13,000 physical Nomad nodes mixed across OS's with a bunch of random workloads using docker, exec, raw_exec, java, and some in-house drivers. I'll clarify that (and I can't... because I can't edit my post anymore :( ).

tetha · on Oct 15, 2021

I don't see where you see multiple nodes. I'd read the parent post as: There are 5 identically sized, beefy VMs. Each of these 5 beefy VMs run 1 instance of the nomad server. And then, there is 13k other VMs or physical systems, which run nomad in client mode. These clients connect to the server to register and get allocated allocations / task groups / practically containers or vms or binaries to run.

We're nowhere near that size, but the architecture is probably identical.

marvinblum · on Oct 15, 2021

Sorry for the delayed response.

We started out with a three-node cluster using the smallest (1 vCPU/2GB RAM) VM you can get. Our initial requirement was to be able to do zero-downtime deployments and have a nice web UI to perform them (alongside the other advantages you get from a distributed system). We have now rescaled them to the next tier (2 vCPU/4GB RAM).

The hardware requirements depend on your workload. We process about 200 requests/sec right now and 600-700 per second on Saturdays (due to a larger client) and the nodes handle this load perfectly fine. All our services are written in Go and there is a single central Redis instance caching sessions.

Our database server is not part of the cluster and has 32 physical cores (64 threads) and 256GB RAM.

I say start out small and scale as you go and try to estimate the RAM usage of your workloads. The HashiStack itself basically runs on a calculator.

proxysna · on Oct 15, 2021

For ~20 jobs in my local network for the longest time i was running single VM with 2cpu 1gb ram. Opening htop was taking more resources than rescheduling a job. For local development just use `nomad agent -dev`.

nucleardog · on Oct 15, 2021

More anecdata:

It, of course, depends entirely on your use case and workload.

I have a few dozen jobs running with what I think is a pretty decent mix/coverage. I've got a bunch of services, some system jobs, half dozen scheduled jobs, etc, etc.

I'm running my cluster with 3x "master" nodes which act as the control plane for nomad and consul as well as run vault and fabio (for ingress/load balancing). I have three worker nodes running against this that are running consul and nomad agents as well as my workloads.

The control plane nodes are currently sitting on AWS's t3.nano instances. `free` shows 462MB total RAM with 100MB available. Load average and AWS monitoring show CPU usage within a rounding error of 0%.

If this were in a professional capacity I probably wouldn't run it this close to the wire, but for personal use this is fine--it will be quite easy to upgrade the control plane when or if the time comes.

dkersten · on Oct 15, 2021

Their (stated) minimum requirements are what stopped me from using it for projects too. My projects were pretty simple, so it was total overkill, but for any bigger projects, it was too risky to try something new. So I ended up never trying Nomad out, even though it always looked really nice.

GordonS · on Oct 17, 2021

I'm really curious about the local development story for Nomad.

I'm using Docker Swarm in production, which let's me easily spin up the same services locally as I do for production - it's really nice to have one set of yaml files that can be used across all environments, and of course it gives more confidence before shipping to prod.

How do you handle local development? Do you have Docker Compose files specifically for that, or something else?

marvinblum · on Oct 27, 2021

Sorry, didn't see your reply because... well HN has no notification thing I think? Anyways, we don't run Nomad locally. All of our services can be started locally without additional tools other than Go.

brightball · on Oct 15, 2021

I've been getting into Ansible and Terraform lately. It seems like Nomad would fit better in an environment where you weren't going to be putting everything in K8s, so a mixture where you have Consul and Vault handling standard VMs alongside your Nomad cluster would make a lot of sense.

kalev · on Oct 15, 2021

Great read, thank you.

marvinblum · on Oct 15, 2021

I'm glad you like it :)

sofixa · on Oct 15, 2021

If someone's interested, i wrote a deeper dive into the matter a few months back, with concrete examples:

https://atodorov.me/2021/02/27/why-you-should-take-a-look-at...

capableweb · on Oct 15, 2021

Thank you for that post, I found it a couple of months ago and helped a lot in making my decision to go with Nomad over Kubernetes!

egberts1 · on Oct 15, 2021

nicely covered.

glynnforrest · on Oct 15, 2021

> Flexible Workload Support

This is Nomad's most underrated feature, IMHO. You don't have to use containers for everything if you don't want to. For example, if you're a golang shop you can run everything as native binaries and cut out docker completely.

Nomad has much simpler networking, i.e. no web of iptables rules to figure out. You can add Consul connect as a service mesh if you need it, but if you don't, you can keep things very simple. Simple = easy to understand, run, and debug.

The main downside for me is a lack of plug and play pieces, e.g. a helm chart for Grafana or Prometheus. You'll have to write your own job for those, though it's very easy to learn. I'd love to see a series of Nomad recipes that people could use.

I think it's the ideal choice for on-prem, bare-metal, or 'weird' deployments where you need a bit more control. You can build the exact stack you need with the different HashiCorp projects with minimal overhead.

I can't recommend it enough! I help people move to Nomad, my email is in my profile if you want to chat :)

robertlagrant · on Oct 15, 2021

> For example, if you're a golang shop you can run everything as native binaries and cut out docker completely.

Ironically you can also just deploy a go executable into an empty Docker container and it's basically the same as the raw executable, but with all the config abstracted to be the same as other containers'.

glynnforrest · on Oct 15, 2021

Good point! Although Docker does add a networking layer on top. I'd prefer to run something like HAProxy without Docker if possible.

m00x · on Oct 16, 2021

You can run it on the host network as well.

shaklee3 · on Oct 16, 2021

Adding the networking layer is a good thing since it's centrally managed

lbriner · on Oct 15, 2021

Flexible workload support doesn't sound great to me.

Good tools have very specific use-cases so as soon as you start talking about IIS on windows, VMs, containers etc. it just sounds like lots of the dependencies you are trying to remove in a scalable system.

Containers are relatively restrictive but they also enforce good discipline and isolation of problems which is a good idea imho. I would not want to continue to support our Windows servers running IIS, just too much to sort out.

glynnforrest · on Oct 15, 2021

I understand what you're saying. Fewer runtimes = fewer problems.

It's not going to force your hand. You can disable all the task drivers except the docker driver if you want a container-only cluster. The drivers themselves are lightweight.

In an ideal world, every company is two years old and only produces software in perfect docker containers, but in reality there's always some service that doesn't work in a container but could benefit from job scheduling.

I think it's great that we can add scheduling to different runtimes. Some folks want or need to use those different runtimes, and I like that Nomad lets you do that.

otterley · on Oct 15, 2021

The question often isn't what we as individuals want, though. It's what's good for the business given operational, budgetary, and personnel constraints.

Many people still have mission-critical Windows applications. Windows has nominal container support, but the footprint of a basic Windows container image is extremely high and the limitations of how you can use them are pretty onerous.

sofixa · on Oct 15, 2021

> The main downside for me is a lack of plug and play pieces,

Hashicorp just announced Nomad Packs filling that precise niche. Still in beta, and it was a long time coming but IMHO it was the main thing missing and is honestly awesome.

dstrickland · on Oct 18, 2021

Here's the repo link in case you need it for Nomad Pack. https://github.com/hashicorp/nomad-pack

GordonS · on Oct 17, 2021

I looked into Nomad a whole back, and finding good examples was a problem - Packs looks fantastic!

Jedd · on Oct 15, 2021

This has been my biggest pain point - world+dog shares their helm charts, but blog posts or git*.com repos with non-trivial cluster recipes are rare.

f.e. We're exploring cortexmetrics currently, and spinning the full stack version up on k8s (openshift) was straightforward. Porting all that spaghetti onto nomad would be a huge job, though part of the frustration is knowing someone, tucked away on a private network, has already done this.

glynnforrest · on Oct 15, 2021

> the frustration is knowing someone, tucked away on a private network, has already done this.

Hard agree. I know this person and have been this person before.

I've toyed with the idea of writing a book of Nomad recipes and tips, I wonder if anyone would read it?

Also, watch this space, helm for Nomad may be coming soon: https://github.com/hashicorp/nomad-pack

Jedd · on Oct 18, 2021

A book - probably not so much.

The big value of sites (mostly github master repos) that offer recipes for saltstack/chef/ansible/puppet, or the helm collective, etc - are that they're continually being tweaked as new versions of (upstream) software are released.

They usually all require a fair bit of localisation before they 'just work', at least in my experience, but the template taken from a proven functioning system, and then abstracted & shared, is worth its weight in gold.

I've set up a few nomad jobs, but nothing anywhere as complex as, say, this cortexmetrics monstrosity. Even our k8s & nomad guru baulks at such an undertaking.

jnsaff2 · on Oct 15, 2021

I liked Mesos when I worked on it, and it's been replaced by more modern tools like Nomad, but every time I have to work on k8s it's .. well it's being promoted like a cult, has a cult following and I think the whole thing is set up to suck up complexity and layering.

How do you deploy a thing to run on k8s?

One would think you deploy a manifest to it and that's it. Like yaml or json or hcl or whatever.

No. The built in thing is not good so someone wrote Helm.

So you deploy helm manifests on there?

No. Helm is not good either, you need helmsman or helmfile to further abstract things.

How do you do networking?

Layer a service mesh with custom overlay networking that abstract upon k8s clusterip.

jeesh. why?

q3k · on Oct 15, 2021

> How do you deploy a thing to run on k8s?

kubectl apply -f ~/git/infra/secretproject/prod.{json,yaml}

One JSON/YAML file too uwieldy? Generate it using jsonnet/CUE/dhall/your favourite programming language. Or just talk directly to the Kubernetes API. You don't have to use Helm - in fact, you probably shouldn't be using Helm (as the whole idea of text templating YAML is... thoroughly ignorant in understanding what Kubernetes actually is).

> Layer a service mesh with custom overlay networking that abstract upon k8s clusterip.

You don't have to, and you probably don't need to. I'm happily running bare metal k8s production without anything more than Calico, Metallb and nginx-ingress-controller. That effectively gives me 'standard Kubernetes networking' ie. working Services and Ingresses which is plenty enough. And if you're using a cluster deployed by a cloud provider, you don't have to worry about any of this, all the hard decisions are made and the components are deployed and working.

throwaway894345 · on Oct 15, 2021

> kubectl apply -f ~/git/infra/secretproject/prod.{json,yaml}

This is fine for the first deploy, but if you delete a resource from your manifests then kubectl doesn’t try to delete it from the cluster, so in practice you need something like Terraform (or perhaps ArgoCD) which actually tracks state. And of course as you mention, you probably want to generate these manifests to DRY up your YAML so you can actually maintain the thing.

I would love to hear from others how they solve these problems.

> as the whole idea of text templating YAML is... thoroughly ignorant

1000%

JelteF · on Oct 15, 2021

I've been using the following command to great effect:

kubectl apply --kustomize manifests/ --prune --selector app=myapp

It cleans up old stuff from the cluster and also allows you to split your manifests across multiple files.

throwaway894345 · on Oct 15, 2021

Whoa. I never knew about --prune. <mind-blown emoji>

_mj78 · on Oct 16, 2021

Yeah... I'm not really sure where this idea of not being able to use the tools available out of the box to deploy apps and do networking comes from. I deploy YAML. If I feel like my YAML is too big, I DRY it using kustomize. I can use --prune if I'm worried about stuff sticking around in the cluster. For networking, I... don't do anything? We get DNS built in. Just use the service name. What else is there to do?

throwaway894345 · on Oct 16, 2021

External DNS, certificate management, and a whole bunch of other stuff if you're not using a cloud provider's managed Kubernetes (e.g., network attached storage, load balancers, ingress controller, etc).

GauntletWizard · on Oct 16, 2021

Label selectors are hard. They might require knowledge as advanced as high-school geometry to understand. The ability to draw a Venn diagram isn't free, you know!

hosh · on Oct 15, 2021

Before there was Helm, I wrote my own tool and framework in Ruby (called Matsuri) to generate and manage manifests. I still use it. However, the source of truth is still on the kube api server. I did write in functionality to diff what is in the git repo and what is in the source of truth. There is an additional abstraction that bundles together resources so that a single command can converge them all together. The underlying mechanism is still kubectl, and the code merely generates the manifests to send it to kubectl via stdin.

I did not think about a “delete”, and I usually don’t want it to be automatically deleted. But that’s a great idea for that convergence mechanism if I have something that explicitly adds a deleted resource line to make sure it stays deleted until otherwise specified.

The Ruby code can access anything Ruby to generate the manifests, so I have a Turing-complete language and I can use class inheritance, mixins, method overrides. I was also able to write a suite of helpers to take a raw yaml manifest from an upstream project, and use a functional programming style to transform the manifest to something else. If I ever have time, I’d write the ability to import templates from Helm.

This was written for a one-person devops role, intended to be able to manage almost-similar things across different environments (dev, staging, prod), which works great for small, early stage startups, for our level of complexity. At this level, I don’t need it to track state.

When our team size grows, I’ll have to think up of some other patterns to make it easier for multiple people to work on this.

The other thing is that for much more complex things, I would just write operator code using Elixir, and bootstrap that into k8s with the Ruby framework.

throwaway894345 · on Oct 15, 2021

> Before there was Helm, I wrote my own tool and framework in Ruby (called Matsuri) to generate and manage manifests.

I've built these kinds of things too in Python (or Starlark). It's an improvement over YAML, but I often feel the pain of dynamic typing.

> The Ruby code can access anything Ruby to generate the manifests, so I have a Turing-complete language and I can use class inheritance, mixins, method overrides.

I actually don't want any of these things in a dynamic configuration language. I kind of just want something that can evaluate expressions (including functions, objects, arrays, scalars, and pattern matching). If it's Turing complete that's fine but unnecessary. I explicitly don't want I/O (not in your list) or inheritance, mixins, method overrides, etc--unnecessary complexity.

Dhall probably comes the closest, but it has a tragic syntax and I'm not a good enough salesman to persuade real software teams to learn it. I've also heard good things about Cue, but my experiences playing with it have only been painful (steep learning curve, don't really see the upsides).

hosh · on Oct 19, 2021

I don't really see this tooling as a dynamic configuration language so much as a manifest generator. That means I also have tooling for diffing, debugging, and converging manifests. If you are just focused on configuration, I can see why Turing-completeness will seem like unnessary complexity.

I use classes and mixins to be able to generate similar manifests with slight differences across different clusters or environments. I sometimes use imports (I/O) for manifests provided by an upstream (such as from AWS docs), do transforms to get manifest I want.

I modelled the design off of Chef, which will also declaritively define and converge a set of systems towards desired state. Well-known paths and conventions helps keep things organized.

I havn't had a problem with dynamic typing, but I have also been using Ruby in application development for over 10 years. You might see it as unneeded complexity, but I have been able to use the flexibility for years now. This tooling was designed for a team that uses Ruby as the primary language, and uses designs and idom that would be familiar for a Ruby dev team. It is definitely opinionated and I don't expect it to be universally useful for everyone.

xyzzyz · on Oct 15, 2021

You can use -prune flag to apply to delete resources removed from your configs.

delta1 · on Oct 15, 2021

Which is still in alpha

Nullabillity · on Oct 15, 2021

> kubectl apply -f ~/git/infra/secretproject/prod.{json,yaml} > > One JSON/YAML file too uwieldy? Generate it using jsonnet/CUE/dhall/your favourite programming language. Or just talk directly to the Kubernetes API. You don't have to use Helm - in fact, you probably shouldn't be using Helm (as the whole idea of text templating YAML is... thoroughly ignorant in understanding what Kubernetes actually is).

Agreed 99%, but there is one useful thing that Helm provides over Kubectl+structured templating (Nix/dhall/whatever): pruning objects that are no longer defined in your manifest.

However, that's solvable without taking on all of Helm's complexity. For example, Thruster[0] (disclaimer: an old prototype of mine) provides a pruning variant of kubectl apply.

[0]: https://gitlab.com/teozkr/thruster

cogman10 · on Oct 15, 2021

I really struggle with seeing the value add of helm. I use it, but mostly with a "less is more" approach. It is semi handy in a dev/staging/prod sort of environment, but really not all that much.

What I don't get is some people in my company thought it was a good idea to make a "docker helm chart"... That is, a helm chart capable of deploying arbitrary containers, ingresses, etc... Like, a true WTF, since the values file ends up looking awfully similar to a k8s manifest :D.

LambdaComplex · on Oct 15, 2021

> the whole idea of text templating YAML is... thoroughly ignorant in understanding what Kubernetes actually is

Could you expand on that? It sounds like an interesting position (bordering on the philosophical), but I don't know enough about Kubernetes to gauge its accuracy for myself.

q3k · on Oct 15, 2021

There's two things:

1) Text templating YAML is just bad. It's the serialized format of some structured data - instead of templating its text representation (and dealing with quoting, nindent, stringly-typed variables, and general YAML badness), just manipulate the structures directly before serializing them. For example, write some Python code that composes these structures in memory and then just serializes them to YAML before passing it over to kubectl. You can also use ready made tooling that uses domain-specific languages well suited to manipulating and emitting such structured configuration, like jsonnet/kubecfg (or CUE/Dhall/Nix combined with kubectl apply -f). Eschewing text templating is not only more productive by letting you build abstractions (eg. to deal with Kuberenetes' verbosity with labels/selectors, etc.), it also allows you to actually compose more complex deployments together, like 'this wordpress library uses this mariadb library' or 'this mariadb library can take a list of objects that will be upserted as users on startup'.

2) Those YAML/JSON manifests aren't even that native to Kubernetes. A lot of things go behind the scenes to actually upsert a manifest, as Kubernetes' resource/object model isn't nearly as straightforward as 'apply this YAML document full of stuff' would indicate (there's state to mix in, API/schema changes, optional/default fields, changes/annotations by other systems...). With k8s' Server-Side-Apply this can now be fairly transparent and you can pretend this is the case, but earlier tooling definitely had to be smarter in order to apply changes. Things like doing structure-level diffs between the previously applied intent and the current intent and the current state in order to build a set of mutations to apply. What this means is that Helm's entire 'serialized as YAML, manipulated at text level' stage is not only harmful and a pain in the neck to work with (see 1.) but also unnecessary (as 'flat YAML file' isn't any kind of canonical, ready-to-use representation that Helm had to use).

jnsaff2 · on Oct 15, 2021

That is a very helpful summary. I generally favor declarative configuration (like CloudFormation) and I think it's mostly due to my work being infrastructure focused: "I need VPC, LB's, ASG with those immutable Images, a queue, buckets etc". But in my recent work with a customer where the infrastructure is "EKS with datastores, and Gitlab CI" .. most of the complexity is people creating abstractions on top of abstractions of stuff in helm (and also .gitlab-ci.yaml with tons of severy nested includes). And in this case the text templated yaml is really painful. Something that would be like CDK for k8s could actually be amazingly useful. Lots to ponder, thank you.

ablekh · on Oct 16, 2021

> Something that would be like CDK for k8s could actually be amazingly useful.

Such thing surely does exist: https://cdk8s.io. And here's the relevant announcement blog post: https://aws.amazon.com/blogs/containers/introducing-cdk-for-....

jpgvm · on Oct 16, 2021

The tool that you want is probably Jsonnet. Take a look at Tanka or one of the other Jsonnet options. There is also Dhaal and a few others but Jsonnet has a lot more traction.

paulgb · on Oct 15, 2021

> Could you expand on that? It sounds like an interesting position (bordering on the philosophical)

I'll make the philosophical case: text-level templating of a computer-readable format is almost always the wrong approach. It becomes unnecessarily hard for someone who later needs to read the code to ensure that it generates valid (both syntax and semantics) markup. It's also more hard for tooling to verify the validity, because the syntactic validity may depend on inputs to the program.

Compare the approaches of PHP and JSX. They both accomplish a similar goal (interleaved HTML and control flow), but JSX works at the element tree level and makes it impossible to generate poorly-formed HTML syntax -- it becomes a compile-time error (though it doesn't ensure semantics, e.g. a tbody outside of a table is legal). Compare with PHP, which very much allows you to generate invalid HTML, because it's just a text templating language.

(From what I can tell, Helm works more like PHP; if I'm wrong my philosophical position stands but might not apply here)

cogman10 · on Oct 15, 2021

k8s is built for extension with validation of the extensions you add ( https://kubernetes.io/docs/concepts/extend-kubernetes/api-ex... )

Helm is just sort of dumb text manipulation with a TINY bit of deployment management built on top of it. There isn't really a whole lot that helm buys over extending k8s.

AlphaSite · on Oct 15, 2021

I’m a big fan of kapp[1] for this sort of thing, it’s basically the same as kubectl apply, but with an app name and state tracking .

    kapp deploy -a my-app -f ./examples/simple-app-example/config-1.yml

It was created by folks at my employer but I use it because I just like it’s simplicity.

[1] https://carvel.dev/kapp/

throwaway923840 · on Oct 15, 2021

> in fact, you probably shouldn't be using Helm

Funny that the thing you shouldn't use is what the entire Kubernetes ecosystem uses for deployment. It's almost like there's no good way to do it.

p_l · on Oct 15, 2021

Except it's... Not what the entire ecosystem uses. Yes, it's popular because it made for "app store"/"package repository" style operation, but I have yet to actually use it for longer than a few weeks despite running k8s projects across multiple companies since 2016.

jjeaff · on Oct 15, 2021

You don't need any of that stuff. When I started with k8s, all I used were Yaml manifest files and kubectl apply.

As I started using it more, i eventually moved up to using helm. I've been running production k8s for a few years now and haven't used helmsman or anything but helm yet.

musingsole · on Oct 15, 2021

> When I started with k8s

But that's the problem right there. Shop after shop has invested in kubernetes with a team that was just learning it. And then they layered tool after tool to help massage pain points -- of which kubernetes has plenty. That leaves us at today where every shop effectively has a custom stack of kubernetes infrastructure cobbled together of various tools and philosophies.

It's the same problem as JavaScript. There's nothing terrible about the technology -- on the contrary it's amazing stuff! But the context of how it gets used leads to a type of gridlock of opinionated tooling published just last week.

smoyer · on Oct 15, 2021

At its core, Kubernetes is a controller manager. It's true that most Kubernetes systems include a lot of the Hashicorp equivalents but you could in theory remove the controllers and associated CRDs. Kubernetes has gradually moved to what I think is a relatively sensible set of defaults which you can get in different flavors (k0s, k3s, microkube, kubespray, etc.)

The comment about development and packaging tools can indeed present a problem. I tend to favor Helm for applications that need to be deployed more than once and Kustomize to differentiate between environments but I've definitely seen what I would consider horror stories. Even if you add a bit of GitOps, it's not too hard to understand. The horror stories seem to occur when teams try to map their previous workflows directly to k8s. Many times the worst k8s development workflows are created to enable teams that had broken workflows without k8s.

heinrichhartman · on Oct 15, 2021

> How do you deploy a thing to run on k8s?

    kubectrl apply -f deploy.yaml

should work, no? What forces you to use the sugar-coating?

sheeshkebab · on Oct 15, 2021

Maybe bc no one uses that in reality and use helm instead?

cassianoleal · on Oct 15, 2021

That's a very odd and factually wrong generalisation.

I don't use helm at all, and I manage a large scale platform built on Kubernetes. Everything is either declared directly in YAMLs and deployed with `kubectl apply -f <YAMLs or directory containing them>`, or rendered using Kustomize and, again, deployed using `kubectl apply -f -`.

Kustomize can be rough around the edges but it's predictable and I can easily render the YAMLs locally for troubleshooting and investigating.

saynay · on Oct 15, 2021

Helm is useful if you need your software to run in many different places, and is widely known. This is why you see so many projects offering Helm charts; because you see them, they are set up to run in many environments.

There is no reason to use it for your own software if you just have a single cluster.

voidfunc · on Oct 15, 2021

Helms a pile of garbage but this isn’t really the fault of Helm. This is an issue with the chart or a failure to read the documentation of the chart.

People have got to stop just blindly running stuff off the internet.

coredog64 · on Oct 16, 2021

At a previous employer, they’re building k8s clusters not for developers but for their infrastructure teams. In the past, where a vendor might have supplied an OVF file as the distributable product, they’re now providing Helm charts.

ninkendo · on Oct 15, 2021

The first time I used helm was to set up JenkinsCI on a k8s cluster on AWS, and in the default configuration, it setup a public-internet ELB listener (with type=LoadBalancer) for Jenkins' internal JNLP port. Which pretty much means the public internet has root access to your jenkins cluster, by default.

I had crypto miners using my k8s cluster within a couple of hours.

That was also the last time I used helm.

SirMonkey · on Oct 15, 2021

You mean the helm-package you installed without reading the documentation? Hate the player, not the game.

ninkendo · on Oct 15, 2021

No. No, no no.

This was legitimate a bug that they immediately fixed when I reported it... There is no legitimate reason to expose the master JNLP port to the internet, ever. The chart did not have a configuration option for this, it just exposed the JNLP port as part of the same k8s Service that exposed the web port. (The JNLP port is for jenkins workers to phone back home to the main instance, it's not for external use.)

"Just read the docs" is not an answer to a chart setup which is just plain broken.

indigochill · on Oct 15, 2021

I'll take that "no one" badge. I've never used Helm, always used kubectl.

Well, until now, when I'm using Terraform (particularly helpful relative to kubectl when I need to provision cloud resources directly), but I've still never used Helm.

lukebuehler · on Oct 15, 2021

Also, there is the middle way: kustomize. It's built into kubectl

p_l · on Oct 15, 2021

I have yet to work in any environment where helm was used heavily and not just for few okne offs setup long ago and that now bring regrets.

rad_gruchalski · on Oct 15, 2021

That’s an organisational problem, not Kubernetes problem.

wvh · on Oct 15, 2021

Some parts of Kubernetes are perhaps unnecessary complex and we should keep our eyes open for alternatives and learn from different approaches, but deploying to Kubernetes really does not have to be that difficult unless you make it so.

Plain manifests in a Git repository will do, and let something like Flux apply them for consistency. It really isn't harder than SSH and ad-hoc scripts or Ansible.

clvx · on Oct 15, 2021

The worst part of k8s is definitely its configuration system. I don’t like helm as it’s combining packaging, templating and deployment lifecycle. I feel like each of this components should’ve been its own abstraction.

q3k · on Oct 15, 2021

It should've been, and it can be. There's nothing official or special about Helm, just don't use it.

p_l · on Oct 15, 2021

So just don't use Helm. It's not a core, crucial, or required part of k8s, it's just a reasonably popular related project... That I usually ban from environments before the rot can take root.

moljac024 · on Oct 26, 2021

what are the alternatives?

thefounder · on Oct 15, 2021

>> jeesh. why?

If you look at Google they build a whole PaaS (Cloud Run/Appengine) on top of k8 and I guess that's the way it's meant to be used.

IaaS -> K8 -> PaaS -> deploy.sh

holografix · on Oct 15, 2021

AppEngine does not run on K8s. Cloud Run _can_ run on GKE with Cloud Run for Anthos or oss kNative

EdwardDiego · on Oct 15, 2021

Nothing about K8s forces you to use Helm or istio etc.

musingsole · on Oct 15, 2021

No, the developers about k8s force you to use Helm, istio, or etc. And then tech debt makes the decision unchangeable.

EdwardDiego · on Oct 16, 2021

YMMV, I guess.

birdyrooster · on Oct 15, 2021

Because you have different team members managing different aspects of your software's deployment (if you have complex systems that you have moved into kubernetes). You can have a team for: authnz/rbac, networking, containerization/virtualization, service load balancing, app development, and secrets all while no one is stepping on each others toes deploying changes while using common language to talk about resources and automation.

dikei · on Oct 15, 2021

> How do you do networking? Layer a service mesh with custom overlay networking that abstract upon k8s clusterip.

Overlay network is easy if you don't need your pod and service IP to be routable from outside the cluster. It took me less than 1 afternoon to learn to use Kubespray to deploy an on-premise cluster with Calico for overlay network.

scns · on Oct 15, 2021

Zero snark, but you confirm parent somehow IMHO. Two more tools.

dikei · on Oct 18, 2021

Well, Kubespray is just an Ansible-based installer that almost walk you through the process of setting up a K8S cluster. And Calico is the default option in Kubespray, so unless you want more customization, you don't need to learn anything about it.

shaklee3 · on Oct 16, 2021

At some point you need to take a few minutes to learn new technologies in a tech career. Kubernetes is just not that hard if you spend a little time understanding it. You don't need a service mesh. Calico is very simple, or you can use cilium for something more advanced

still_grokking · on Oct 16, 2021

> but every time I have to work on k8s it's .. well it's being promoted like a cult, has a cult following

I would call this grass roots marketing. Some companies are very good at it. Maybe that ones have more know-how in the ad space than others?

throwaway923840 · on Oct 15, 2021

> jeesh. why?

Poor design. It's been cobbled together slowly over the past 7 years. They built it to work only a certain way at first. But then somebody said, "oh, actually we forgot we need this other component too". But it didn't make sense with the current design. So they'd strap the new feature to the side with duct tape. Then they'd need another feature, and strap something else to the first strapped-on thing. Teams would ask for a new feature, and it would be shimmed in to the old design. All the while the design was led by an initial set of "opinionated" decisions that were inherently limiting. The end result is a mess that needs more and more abstraction. But this isn't a problem when you are already a billion-dollar company that can pay for other teams to build more abstractions and internal tools to deal with it.

This is business as usual in any quasi-monolithic Enterprise project. Typically the way you escape that scenario is to have major refactors where you can overhaul the design and turn your 50 abstractions into 5. But instead, K8s decided to have very small release and support windows. This way they can refactor things and obsolete features within a short time frame. The end result is you either stick with one version for eternity (a security/operability problem) or you will be upgrading and deal with breaking changes every year for eternity.

peterthehacker · on Oct 15, 2021

So much flamebait. These kinds of comments aren’t good for anyone.

omneity · on Oct 15, 2021

Nomad is amazing. We've been using it alongside Consul for close to 2 years at this point at Monitoro.co[0].

We started by using it as a "systemd with a REST API" on a single server, and gradually evolved into multiple clusters with dozens of nodes each.

It has been mostly a smooth ride, even if bumpy at moments, but operationally speaking Nomad is closer to Docker Swarm in simplicity and Kubernetes in terms of the feature set.

We didn't find ourselves needing K8s as we're also leveraging components from the cloud provider to complete our infrastructure.

[0]: https://monitoro.co

oneweekwonder · on Oct 15, 2021

> Nomad is closer to Docker Swarm in simplicity and Kubernetes in terms of the feature set.

This a question I still need to google but what features does Kubernetes have that Docker Swarm needs?

Because the perceived complexity of Kubernetes just blows my mind, where Docker Swarm seems a lot more simpler for the same benefits, but my its just abstracted away?

I will say upfront im naive when it comes to container tech.

proxysna · on Oct 15, 2021

Swarm has a lot of issues. Some that are on the surface like bad networking and stemming from it scaling issues. Others are related directly to Mirantis. Company that owns docker.inc now. It neglects the swarm part of the docker, even was planning to straight up sunset swarm and move everyone to k8s. They do maintenance, and add feature or two a year which is not enough. Swarm is great for small deployments, as it only requires a docker daemon present. Otherwise you should look towards nomad/k8s/whatewer the cloud solution is.

_wldu · on Oct 15, 2021

I have never deployed or built a K8s cluster, but recently I moved about a dozen AWS ECS Fargate workloads into a K8s cluster that a colleague of mine has setup. I was surprised. I really like it (from the perspective of a user/developer). I deploy my apps by updating a simple yaml file in a git repo. All my other practices (vim, go code with standard Makefiles, docker container image registry) are unchanged.

I also think K8s is a reasonable cloud independent abstraction when you need to move your workloads to another provider. It prevents cloud lock-in. And I suppose Nomad would do that too.

So far my K8s experience has been very good. However, if I ever have to do more with it (on the administrative side) I may have a less positive experience. Not sure.

capableweb · on Oct 15, 2021

While interesting in itself, I'm not sure how this is relevant to this Nomad blogpost, unless you tried Nomad before starting to use Kubernetes and went with Kubernetes anyways.

tekkk · on Oct 15, 2021

What was wrong with ECS Fargate if you don't mind me asking? Too expensive? Too vendor locked-in?

_wldu · on Oct 15, 2021

We want to apply consistent policies and technical controls across multiple dev groups that have multiple AWS accounts. K8s seems to be a good solution for that.

throwaway923840 · on Oct 15, 2021

Was it not possible in ECS? Or did your teams just want to use K8s, and decided they would use the opportunity of wanting to organize things better as a reason to switch? It seems to me that with SSO and IAM you could create virtually any kind of access controls around ECS. K8s doesn't solve the multi-account problem, and federation in K8s can be quite difficult.

still_grokking · on Oct 16, 2021

I'm not a fan of AWS but to be fair it has all those features OOTB.

You can have AWS "organizations" managing multiple accounts. IAM will allow to set up any policies you want in your organization.

On the other hands side now you have a multitude of complexity added to your setup…

If I would be snarky I would see a case of "resume driven development". But let's just assume it was lack of time to find the simplest solution and it's than often preferred to just go with the flock as so many others can't be possibly wrong.

tekkk · on Oct 15, 2021

All right, thanks! Just wondered as I have a ECS setup and I really prefer not having to switch unless I have to.

roland35 · on Oct 15, 2021

Not op, but I can't imagine ecs being cheaper than eks since ecs itself is free (you pay for fargate either way)

Ecs is pretty simple but does not have the large mindshare that kubernetes or nomad have.

shrumm · on Oct 15, 2021

To most people, it's not Nomad vs Kubernetes - it's a choice between Nomad vs Managed Kubernetes.

All major cloud providers offer a managed kubernetes service at minimal added cost to running the worker VMs yourself.

With managed Kubernetes, the simplicity question is no longer obviously in Nomad's favour. As other comments allude to, Kubernetes as a user is pretty easy to master once you get used to its mental model.

mac-chaffee · on Oct 15, 2021

I'm currently trying to convince people that a managed k8s service is not that "simple", and that we can't "just spin up another cluster" without a great deal of operational overhead.

Some of the things that might still be needed in managed k8s instances: better ingress with ingress-nginx, cert-manager, monitoring/logging/alerting, tuning the alerts, integration with company SSO, security hardening.

If it's a multi-tenant cluster: LimitRanges/ResourceQuotas, NetworkPolicies, scripts to manage namespaces and roles, PodSecurityPolicies (or equivalent), onboarding offboarding procedures.

I'm sure you'd need similar things to have a proper production Nomad cluster too, so your point still stands. But at least for EKS/GKE clusters, they're pretty bare-bones.

busterarm · on Oct 15, 2021

As someone with 7-digit spend in GKE/EKS, I will agree with you that it is _anything but simple_.

Your developers aren't going to say that it's simple when Google force upgrades their cluster to a version that deprecates APIs in their yamls for a job they worked on 2 years ago and swiftly forgot about.

Then when you explain to them that Google insists on putting everyone on a force-upgrade treadmill, you can literally watch as the panic sets in on the faces of your engineering team managers/leads.

Nomad is a breeze in comparison to managed K8s.

Everyone that I've talked to that thinks Kubernetes is simple is barely using the thing and could likely save a lot of money and development effort using something like Nomad instead.

jpgvm · on Oct 16, 2021

So instead of using higher level primitives that are widely used and tested and have many simple high quality integrations (external-dns, cert-manager, et al) you would recommend reinventing all of that on Nomad and then calling that "saving a lot of money and development effort"?

Yeah no thanks.

At this point k8s has "won" for all intents and purposes. It has gained critical mass, succeeding where other infrastructure management tools both open and closed source have failed.

Also API upgrades shouldn't be a problem, even if you were using beta APIs. They are only really troublesome if you decided to indulge yourself with some alpha APIs before they were fully baked. If you don't do that then you won't run into any problems.

busterarm · on Oct 16, 2021

Thanks, Mr. Well Ackshually,

First of all, those things you're talking about are installables in K8s. They don't come by default. Many people with (managed) kubernetes installations aren't even using them and get by just fine. Having them certainly isn't free and work certainly had to be done to build those for Kubernetes and likely are (or will be) trivial to implement elsewhere. I certainly was able to automate my certificate management infrastructure before I had Kubernetes.

The reality is that there are many companies out there that want like 10% of the features of an orchestrator like Kubernetes and don't need all of those features that make you think that it's a zero-sum game that Kubernetes has won.

The reality is that there are competing offers like Nomad that more-easily accomplish our technology goals and thus more attractive to enterprises with big budgets like mine.

Kuberenetes having the "critical-mass" that you speak of isn't to the exclusion of other competing tools having "critical-mass". Just like Oracle and DB2 aren't the dominant RDBMS of today.

As for calling people indulgent for using beta APIs, let's not forget that Deployment, StatefulSet, DaemonSet, ReplicaSet, NetworkPolicy and PodSecurityPolicy all started as beta APIs and were only removed in 1.16. And I imagine you wanted to use ingresses, but the old ingress.class annotation was deprecated and replaced with IngressClass in 1.18 and that was firm cutover for _everyone_. I can't imagine Kubernetes being very useful to many people without any of these...

jpgvm · on Oct 17, 2021

There was no "Well Ackshually" in my reply.

I just pointed out that for everything you would need to build custom on Nomad there exists off-the-shelf components that will plug right into k8s. There is no way this would save money or development time, which was my main contention with your assertions.

You basically mischaracterized everything I said and then missed the entire point.

I don't disagree that there are some cases where Nomad would be superior. Off the top of my head if I wanted to build a modern rendering farm and I knew I wouldn't want to use the cluster for anything else then I would consider it over the HPC toolkits I have previously used for that (RIP Grid Engine) or Mesos which used to be king of that space.

The problem is these are incredibly niche cases and for 99.99% of companies they are better off swimming with the flow.

This is what I mean by "k8s has won". It's not that other things won't continue to exist and be created but most of them won't survive unless they commercialise within a niche or find a sufficiently large user that is willing to do the vast majority of the development (eg. Netflix Titan).

It's not a zero sum game but it's damn close. See here the corpses of Docker Swarm, Convox, Flynn ( :( ), original Deis, arguably ECS (it's a zombie at this point let's be honest).

The tide really turned when Azure and AWS were basically forced into offering managed k8s. In AWS case they also had to made deep modifications to VPC and IAM to properly support it. These investments wouldn't have been made unless forced which lends credence to the weight k8s has in the ecosystem. (Worth mentioning there is no hosted Nomad I'm aware of, their enterprise offering is install + support)

k8s is becoming the POSIX of distributed scheduling. The API you use to run diverse workloads over large numbers of machines that is relatively portable between companies. Right now there is still some vendor nonsense going on but over time it will be smoothed out.

The vast majority of distributed applications are going to be built targeting k8s as their runtime API. We can already see this happening but it will only increase over time.

To summarize I think it's fair to say things like "Nomad can sometimes be good if conditions x/y/z are met" but I think it's very dishonest to consider it as a viable competitor to k8s in the general case for most users because of the reasons outlined above.

darkwater · on Oct 15, 2021

Thanks for this, I was really starting to think I was the madman.

pid-1 · on Oct 15, 2021

Agreed. Managing a Nomad cluster alongside a Consul cluster does not require a PHD, but it's also not a walk in the park.

Hopefully Hashicorp will have managed Nomad soon.

still_grokking · on Oct 16, 2021

> managed kubernetes service at minimal added cost

I dare to disagree. The costs are in fact horrific!

There are orders of magnitude in costs between running your stuff yourself vs the highest level of managed services on the cloud which is usually managed k8s.

(That also explains why there is so much marketing fuss, and push of management, to use k8s: It's by far the most profitable offering for the cloud providers).

jpgvm · on Oct 16, 2021

It's not. You generally only pay for the master clusters with no per-node costs over whatever instance type you are using and scheduling class (ondemand, reserved, spot, etc).

So not only are the masters relatively cheap, they cost is amortized over the cluster size.

AWS is probably very unhappy about this but they were forced into this by GKE pricing model. Google pushed people to adopt k8s not because they can make a lot of money from people using it but because they can take away a lot of money from AWS and increase workload portability, which benefits Google far more than AWS as they are in second place.

The most expensive managed services are things like RDS, MSK, Elasticache, etc. MSK in particular is pretty egregious with a 80% premium considering they haven't done any additional engineering like RDS.