Hacker News new | past | comments | ask | show | jobs | submit login
Nomad vs. Kubernetes (nomadproject.io)
547 points by capableweb on Oct 15, 2021 | hide | past | favorite | 361 comments



Despite its reputation, Kubernetes is actually quite easy to master for simple use cases. And affordable enough for more complex ones.

The fundamental abstractions are as simple as they can be, representing concepts that you'd already be familiar with in a datacenter environment. A cluster has nodes (machines), and you can run multiple pods (which is the smallest deployable unit on the cluster) on each node. A pod runs various types of workloads such as web services, daemons, jobs and recurring jobs which are made available (to the cluster) as docker/container images. You can attach various types of storage to pods, front your services with load-balancers etc.

All of the nouns in the previous paragraph are available as building blocks in Kubernetes. You build your complex system declaratively from these simpler parts.

When I look at Nomad's documentation, I see complexity. It mixes these basic blocks with those from HashiCorp's product suite.


I couldn't disagree with this more. I manage and run Kubernetes clusters as part of my job, and I can tell you that configuring, running and installing these clusters is no small feat. I don't know much about Nomad, but I would highly encourage most users to not think K8S is simple by any standard. Multiple cloud providers now provide ways to run your code directly, or your containers directly. Unless you have a reason to manage that complexity, save yourself some heartburn and make your developers happy that they just need to worry about running their code and not what happens when a seemingly innocuous change might cause a cascading failure.

This has some great account of war stories - https://k8s.af/


Honestly if you use a managed kubernetes provider it's pretty simple once you nail the concepts. Though you'll get hit with cloud provider issues every now and then but it's really not terrible.

I'd manage services in Kubernetes before I start deploying VMs again, that's for sure.


Sure, you pay someone else to keep k8s alive, it's not so bad, but it's expensive to do that. You generally need a full-time team of people to keep a k8s deployment alive if you are running it yourself.

I keep Nomad alive part-time. It's operationally simple and easy to wrap ones head around it and understand how it works.


A lot of the cloud providers don't bill directly for kubernetes management, instead it's just the node resources.

Either way, as another comment points out, Rancher and many other solutions make the orchestration of creating your own Kubernetes cluster really boring.

We run a few kubernetes clusters on premise, and for the longest time it was just 1 person running some Kubernetes clusters.. we even have other teams in QA/Engineering running their own with it.


Which cloud providers don't? I'm only familiar with AWS and GCP, and they both have a base hourly charge per-cluster.


Azure as far as I know doesn't have base charges unless you want a paid SLA, It's Azure though.


DigitalOcean and Linode that I'm aware of, may be others


DigitalOcean charges extra for k8s nodes(compared to VPS), So does Linode last I bothered to check.


Scaleway


You can also use something like Rancher or k3s to keep it alive part-time.


RKE2 is the closest thing we've found to a turnkey, on prem, production ready Kubernetes distribution. Though we were last looking about a year ago so I concede there might be better or comparable options now.


nope. have a GKE cluster running "unattended" for months now. looks fine. ;)


GKE is the industry's first fully managed Kubernetes service...

I.e. you don't run k8s, Google does it for you. I'm talking about where YOU run k8s, not run ON k8s. I agree running ON k8s is pretty easy.


Almost everything is pretty simple "once you nail the concepts". It's getting to the point where you have the concepts nailed that is the measuring stick for complexity.


> Despite its reputation, Kubernetes is actually quite easy to master for simple use cases.

This has so many assumptions rolled up in it. When we move from niche assumptions to the masses of app operators and cluster operators (two different roles) it tends to fall apart.

Which ultimately leads to things like...

> This has some great account of war stories - https://k8s.af/

We shouldn't assume other people are like us. It's worth understanding who folks are. Who is that ops person who wants to run Kubernetes at that insurance company or bank? Who is the person setting up apps to run for that auto company? What do they know, need, and expect?

When I answer these questions I find someone far different from myself and often someone who finds Kubernetes to be complicated.


I also maintained several kubernetes clusters and never found it very difficult. We never had any major outages, and the only downtime was trying new features.


What's the scale of your clusters? If you're running anywhere between 5 - 20 nodes with a 10 pods on each node. I think that's a very small deployment and you'll be able to breeze by with most of the defaults. You still need to configure the following though - logs, certificates, authentication, authorization, os upgrades, kubernetes upgrades, etc.

I'm not sure if all of this is worth it, if you're running a small footprint. You're better of using more managed solutions available today.


5 clusters of around 100 nodes each on average


I dunno man, k8s is pretty simple, but only if you make it so, I have built (early on) complicated setups with all kinds of functionality, but the complexity was rarely used. I now manage clusters and deployments on those clusters with code, terraform to be exact (tcl expect for instances when terraform can't hang yet) I built a gui interface for terraform to manage deployments of kubernetes on various clouds and bare metal and to manage the deployments To kubernetes. \n Maybe one day this will get too complex and we'll do something else but so far so good.


From a system architecture perspective, kubernetes is very complex since it handles a multitude of complexities, but that's why it _can_ be very simple from a user's perspective. Most of the complexity can be ignored (until you want it).

It's the same reason I still like to use postgres when I can versus NoSQL until I know I need the one feature I may not be able to achieve with postgres: automatic sharding for massive global scale. The rest of the features postgres (and friends) give easily (ACID etc) are very tricky to get right in a distributed system.

It's also the same reason bash is great for tasks and small automated programs, but kind of terrible for more complex needs. The primitives are super simple, easy to grok, and crazy productive at that scale, but other languages give tools to more easily handle the complexities that start to emerge when things get more complicated.


> From a system architecture perspective, kubernetes is very complex since it handles a multitude of complexities, but that's why it _can_ be very simple from a user's perspective.

I’ve been doing software for 20 years in one form or another. One of the things I’ve learned is the simpler and more polished something seems to the user, it is almost always because there is a hell of a lot of complexity under the covers to make it that way.

Making something that handles the 80% is easy. Every step closer to 100% becomes non-linear in a hurry. All that polish and ease of use took months of “fit & finish” timeboxes. It took tons of input from UX and product. It involved making very hard decisions so you, the user, don’t have to.

A good example is TurboTax online. People love to hate on their business practices (for good cause) but their UX handles like 98% of your common tax scenarios in an incredibly easy to use, highly polished way. Robinhood does a pretty good job too, in my opinion—there is a lot of polish in that app that abstracts away some pretty complex stuff.


It doesn't take long working with Nomad to hit the cases where you need to augment it. Now I know some of people enjoy being able to plug and play the various layers that you get in the complicated kitchen sink Kubernetes.

We already had something that just ran containers and that was Mesos. They had the opinion that all the stuff like service discovery could be implemented and handled by other services like Marathon. But it did not work well. Most people that was to deploy containers in an orchestrated manner want service discovery.

At least these parts are being provided by the same company (Hashicorp) so it probably won't suffer the same lack of coordination between separate projects that Mesos did.

The benefit to the kitchen sink opinionated framework that K8s does is that your deployments and descriptors are not very difficult to understand and can be shared widely. I do not think the comparison of massively sharded NoSQL to Postgres is the same because most people will not need massive sharding, but almost everyone is going to need the service discovery and other things like secrets management that K8s provides.


One of the things that I don't like about Nomad is HCL. It is a language that is mainly limited to HashiCorp tools and there's no wider adoption outside at least not to my knowledge.

From the documentation:

> Nomad HCL is parsed in the command line and sent to Nomad in JSON format via the HTTP API.

So why not just JSON or even JSON at all and not MsgPack or just straight up HCL because that's over and over introduced as being machine readable and human friendly both at the same time?


I've only used Terraform, but I absolutely love HCL as a configuration language. I know I'm in the minority about this, but it's so much less fiddly and easier to read than json or yaml. I do wish there were more things that used it.

JSON is fine for serialization, but I hate typing it out. There are too many quotes and commas - all keys and values have to be quoted. The last item in a list or object can't have a trailing comma, which makes re-ordering items a pain. Comments aren't supported (IMO the biggest issue).

YAML is too whitespace dependent. This is fiddly and makes copy-pasting a pain. I'm also not sure how it affects serialization, like if you want to send yaml over the network, do you also have to send all of the whitespace? That sounds like a pain. OTOH I like that quotes are usually optional and that all valid json is also valid YAML. Comments are a godsend.

HCL has the same basic structure of objects and lists, but it's much more user friendly. Keys don't need to be quoted. Commas aren't usually needed unless you compress items onto the same line. It supports functions, list/object comprehensions, interpolations, and variable references which all lead to more powerful and DRY configuration. Granted I'm not sure if these are specific to TF's HCL implementation, but I hope not.

For serialization, HCL doesn't have any advantage over JSON. Sure it's machine-readable but probably much harder to write code that works on HCL specifically than to convert to JSON and use one of the zillions of JSON libraries out there.


JSON was designed for machine readability, HCL was designed for human readability.

HCL requires a lot more code to parse and many more resources to keep in memory vs JSON. I think it completely makes sense to do it this way. K8s is the same. On the server it does everything in JSON. Your YAML gets converted prior to sending to K8s.


Parsing JSON Is a Minefield (2016):

https://news.ycombinator.com/item?id=28826600


I don’t think Json was designed, it is just JavaScript objects plus Douglas Crockford spec. Having said that, HCL really doesn’t click with me


Crockford himself says JSON was not invented but rather discovered.


Didn’t know he did but it is somewhat obvious. Lispish roots of js do shine through sometimes


Syntax and data structures are very similar to LPC serialization. Obvious but very useful.

https://en.m.wikipedia.org/wiki/LPMud


To my understanding, you can write most (all?) of the job/config files in JSON if you wish. At my company, we have a ton of HCL files because in the beginning it was easier to hand-write them that way, but we're now getting to the point where we're going to be templating them and going to switch to JSON. In other words, I believe HCL is optional.


OctopusDeploy selected HCL for its pipeline config as code.

https://octopus.com/blog/state-of-config-file-formats


The important difference with k8s from my experience is that from the very early days it modeled a common IPC for any future interfaces, even if TPR/CRD tools some time to hash out. This means that any extra service can be simply added and then used with same general approach as all other resources.

This means you get to build upon what you already have, instead of doing everything from scratch again because your new infrastructure-layer service needs to integrate with 5 different APIs that have slightly different structure.


> The rest of the features postgres (and friends) give easily (ACID etc) are very tricky to get right in a distributed system.

But that's just basically a calculated tradeoff of Postgres (and several CP databases) trading Availability for Consistency.


Probably less calculated and more "that's what's available to offer stably right now that we can feasibly deliver, so that's what we'll do." Distributed RDBMS were not exactly cheap or common in open source a couple decades back. I don't think there was much of a choice to make.


I mean it is a trade off though. You cannot beat the speed of light. The further apart your database servers are, the more lag you get between them.

If you want a transactional, consistent datastore you are gonna have to put a lock on something while writes happen. And if you want consistency it means those locks need to be on all systems in the cluster. And the entire cluster needs to hold that lock until the transaction ends. If your DB’s are 100ms apart… that is a pretty large, non negotiable overhead on all transactions.

If you toss out being fully consistent as a requirement, things get much easier in replication-land. In that case you just fucking write locally and let that change propagate out. The complexity then becomes sorting out what happens when writes to the same record on different nodes conflict… but that is a solvable problem. There will be trade offs in the solution, but it isn’t going against the laws of physics.


> If you want a transactional, consistent datastore you are gonna have to put a lock on something while writes happen. And if you want consistency it means those locks need to be on all systems in the cluster.

FWIW, it's not as bad as that sounds. There are traditional locks, and there is optimistic locking. If a there are two conflicting transactions, a traditional lock detects this before it happens (by insisting a lock is obtained before any updates are done) and if there is any chance of conflict the updates are run serially (meaning one is stopped while the other runs).

Optimistic locks let updates run with lock or blocking at all, but then at the end they check if the data they depended on (ie, data that would have been locked by the traditional mechanism) has changed. If it has they throw it all away. (Well, perhaps not quite - they may apply one of the conflicting updates to ensure forward progress is made.) The upside of this is there is if there are no conflicting updates everything runs at full speed - because there is no expensive communication about why has what lock going. The downside is a lot of work may be thrown away by what amounts to speculative execution.

Most monolithic databases use traditional locking. Two CPU's in the same data centre (or more likely on the same board) can rapidly decide who owns what lock, but cycles and I/O on a high end server are precious. Distributed ACID databases like spanner, cockroachdb and yugabytedb favour opportunistic because sending messages half way across the planet to decide who owns what lock before allowing things to proceed takes a lot of time, whereas the CPU cycles and I/O's on the low end replicated hardware are cheap.

While opportunistic locks allow an almost unlimited number of non-conflicting updates to happen concurrently, their clients still have to pay a time penalty. The decision about whether there was a conflicting update still has to be made, and it still requires packets to cross the planet, and while all this happens the client can't be sure if their data has been committed. But unlike the traditional model, they are never blocked by what any other client is doing - providing it doesn't conflict.


Yes, but my point was there wasn't really a choice to make at that time, therefore no trade off.

Even if I won $100 in the lotto today and had the money in hand, I wouldn't describe my choice which house I bought years ago as a calculated trade off between what I bought and some $10 million dollar mansion. That wasn't a feasible choice at that time. Neither was making a distributed RDBMS as an open source project decades ago, IMO.


Wasn’t MySQL (pre-Oracle) an open source distributed RDBMS decades ago? At least I remember running it using replication in early 2000’s


MySQL replication isn't really what I would consider a distributed RDBMS in the sense we're talking about, but it is in some senses. The main distinction being that you can't actually use it as a regular SQL interface. You have to have a primary/secondary and a secondary can't accept writes (if you did dual primary you had to be very careful about updates), etc. Mainly that you had to put rules and procedures in place for how it was used in your particular environment to allow for sharding or multiple masters, etc, because the underlying SQL system wasn't deterministic otherwise (also, the only replication available then was statement based replication, IIRC).

More closely matching would be MySQL's NDB clustered storage engine, which was released in late 2004.[1] Given that Postgres and MySQL both started ab out 1996, that's quite a time after initial release.

I spent a while in the early to mid 2000's researching and implementing dual master/slave or DRBD backed MySQL HA systems as a consultant, and the options available were very limited from what I remember. There's also probably a lot better tooling these days for developers to make use of separated read/write environments, whereas is seemed fairly limited back then.

1: https://en.wikipedia.org/wiki/MySQL_Cluster


At the moment you need massive sharding options with Postgres, you've got a lot of options. I'd assume by the time you get there, you can probably budget for them pretty easily as well.

Citus is specifically designed for it.

Timescale appears to have the use case covered too.

Ultimately though, sharding is one of the easier ways to scale a database out. NoSQL just does it by eliminating joins. You can do that pretty easily with almost any solution, PG or otherwise.


The comparison is a bit misleading. As someone that has used both nomad and k8s at scale --

- Nomad is a scheduler. Clean and focused. It is very fast. I was an early user and encountered a number of bumps, but that's software. The people at Hashicorp are super sharp and lovely.

- K8s is a lot more. It includes a scheduler, but in the simplest sense, it is a complete control-plane based on the control-loop pattern. You have an API, a scheduler, a db, various controllers, etc. Forget for a moment that most people use it to orchestrate containers -- it's really designed to orchestrate anything. Its API is extensible, you can add and compose controllers -- there are many possibilities once you wrap your head around it.

This is all opinionated and includes a lot of capability. It's just very different.

You can stitch together nomad, consul, vault, and various glue to create a container orchestration system... but when you start wanting to manage the control-planes as though they are the "kind" (the container for example) with meta-control-planes, and you start wanting to orchestrate network, storage, and other dependencies... all while doing this in a multi-tenant environment, then things get interesting.

-charles.


I'm not sure it's a good idea to use Google's "monolithic cloud operation system".

Such a monoculture has the same issues as MS Windows had, even for the same reasons.

The "Unix Way" of simple tools interacting seems more reasonable. Especially when it comes to lock-in effects.


K8s itself is divided in multiple parts, where you can customize to your own liking, and you can swap parts if you'd like as long as the APIs are similar.

It's very much built the UNIX way.


Where can I find those alternative elements that can be swapped? If this is true there should be a lot of them, right?


If your complaint about Kubernetes is that it doesn’t provide you enough choice/extensibility, you’re probably not looking hard enough.

Runtimes (CRI) - Docker shim (deprecated), containerd, CRI-O, kata

Networking (CNI) - Flannel, Calico, Cilium, cloud specific ones, many more

Storage (CSI) - way too many device plugins - GPU, TPU, RDMA/SRIOV NICs

Data store - etcd, dqlite in microk8s, SQLite/postgresql/mysql in k3s

DNS - kube-dns (somewhat deprecated), CoreDNS

Ingress - NGINX (multiple), HAProxy, Envoy (many), etc

Kubelet -https://github.com/virtual-kubelet/virtual-kubelet

Kube-proxy - Cilium can act as a replacement

Cloud-controller-manager for each cloud provider

kube-scheduler - https://kubernetes.io/docs/tasks/extend-kubernetes/configure...

kube-apiserver and kube-controller-manager are two parts where I’m not aware of any other implementations but a) they are kind of the heart of k8s and b) can be easily extended with CRDs/operators.


This is so far from the truth, I'm having a hard time imagining why you even think this. Kubernetes is practically the distributed embodiment of the Unix philosophy. You have a core set of interfaces and components that need to offer a particular API, and other than that, whether it is one program or many, written by one developer or hundreds, by a private company or via volunteers contributing to open source projects, is totally up to how you want to do it. You're free to use the original reference implementation that used to be owned by Google a decade ago before they open sourced it and donated it to the CNCF, but you certainly don't have to. Others have mentioned k3s, which is the busybox to the reference kubernetes GNU coreutils, all Kubernetes, plus ingress and network overlay, in a single binary, with am embedded sqlite db as the backing store instead of etcd. But k3s is still "Kubernetes." Kubernetes is a standard, much like POSIX. It's maybe unfortunate that the original reference implementation is also also named "kubernetes" because a lot of people seem to think that one is the only one you can use, and it has historically been complex to set up, but the reason for the complexity is it doesn't make any choices for you.

Imagine if you wanted to use a Unix operating system, but instead of choosing a Linux distro, you just read the POSIX standard and went out and found every required utility, plus a kernel, and had to figure out on your own how to get those to work together and create a system that can run application-level software. If you just go to kubernetes.io and follow the instructions on how to get up and running with the reference implementation, that is what you're doing. It makes no decisions at all for you. You can run external etcd, or use kubeadm to set it up for you. You can run it HA or on a single node. You can add whatever overlay network you want. You can use whatever container runtime engine you want. You can use whatever ingress controller you want, or none at all, and not have any external networking, just as you can install Linux From Scratch and not even bother to include networking if you want a disconnected system for some reason.

You have pretty much complete user freedom, and that is, in fact, the source and reason for a whole lot of complaints. Application developers and even most system administrators don't want to have to make that many decisions before they can even get to hello world. I believe Kelsey Hightower commented on this a while back, saying something to the effect that Kubernetes is not meant to be a developer platform. It's a framework for creating platforms.

Application developers, startups, and small business should almost never be using Kubernetes directly unless they're actually developing a platform product. Whether you use a "distro" like RKE2 or k3s or a managed service from a cloud provider, building out your own cluster using the reference kubernetes is the modern day equivalent of deploying a LAMP stack but doing it on top of Linux From Scratch.


> Despite its reputation, Kubernetes is actually quite easy to master for simple use cases. And affordable enough for more complex ones.

Are you referring to actually spinning up and operating your own clusters here or utilizing managed k8s (e.g. GKE/EKS)?

In my understanding, sure - using the managed services and deploying simple use cases on it might not be that big of a deal, but running and maintaining your own k8s cluster is likely far more of a challenge [1] than Nomad as I understand it.

[1] https://github.com/kelseyhightower/kubernetes-the-hard-way


Kubernetes the hard way is an educational resource, meant to server as a resource for those interested in deep diving into the platform, similar to Linux from scratch. Like the statement on the label:

> Kubernetes The Hard Way is optimized for learning, which means taking the long route to ensure you understand each task required to bootstrap a Kubernetes cluster.

> The results of this tutorial should not be viewed as production ready, and may receive limited support from the community, but don't let that stop you from learning!

If you want to spin up cluster that you actually want to use you'll pick one of the many available free or paid distros, and spinning up something like k3s with rancher, microk8s or even the pretty vanilla option of kubeadmin is pretty simple.


Agreed. With k3s and friends, nomad is more complicated and requires other components (Consul, some kind of ingress) to match what you get out of the box.


Running a production ready, high availability Kubernetes cluster with proper authn, authz, and resource controls is about the farthest thing from "simple" that I can imagine.


Profoundly disagree. In fact, your statements are not true, since before you can run a pod you will most probably need to create a deployment. And then you'll need a service to expose your workload.

Anything but the most trivial workloads will also lead you into questions of how to mount volumes, cijfure) use configmaps and secrets, etc.

And that's not even touching the cluster configuration, which you can skip over if you are using a cloud provider that can provision it for you.


It’s wouldn't have that reputation if it was easy. There’s too many things that can go wrong for it to be easy.


Definitely the contrary in my experience


They're both complex. But one of them has 10 times the components than the other, and requires you to use them. One of them is very difficult to install - so much so that there are a dozen different projects intended just to get it running. While the other is a single binary. And while one of them is built around containers (and all of the complexity that comes with interacting with them / between them), the other one doesn't have to use containers at all.


> But one of them has 10 times the components than the other

I've said this before. Kubernetes gives you a lot more too. For example in Nomad you don't have secrets management, so you need to set up Vault. Both Nomad and Vault need Consul for Enterprise set ups, of which Vault needs 2 Consul clusters for Enterprise setups. So now you have 3 separate Consul clusters, a Vault cluster, and a Nomad cluster. So what did you gain really?


Kubernetes' secrets management is nominal at best. It's basically just another data type that has K8S' standard ACL management around it. With K8S, the cluster admin has access to everything, including secrets objects. It's not encrypted at rest by default, and putting all the eggs in one basket (namely, etcd) means they're mixed in with all other control plane data. Most security practitioners believe secrets should be stored in a separate system, encrypted at rest, with strong auditing, authorization, and authentication mechanisms.


It's "good enough" for most and extension points allow for filling the gaps.

This also dodges the crux of GP's argument -- instead of running 1 cluster with 10 components, you now need a half dozen clusters with 1 component each, but oops they all need to talk to each other with all the same fun TLS/authn/authz setup as k8s components.


I'm a little confused. Why does the problem with K8S secrets necessitate having multiple clusters? One could take advantage of a more secure secrets system instead, such as Hashicorp Vault or AWS Secrets Manager.


The point is that once you're talking about comparable setups, you need all of Vault/Nomad/Consul and the complexity of the setup is much more than just "one binary" as hashi likes to put it.

> So now you have 3 separate Consul clusters, a Vault cluster, and a Nomad cluster. So what did you gain really?

GP's point was already talking about running Vault clusters, not sure you realized we aren't only talking about nomad.


The only thing I was trying to say is that although K8S offers secrets "for free," it's not best practice to consider the control plane to be a secure secrets store.


That's false. Vault has integrated storage and no longer needs Consul.

If you want to have the Enterprise versions( which aren't required), you just need 1 each of Nomad, Consul, Vault. Considering many people use Vault with Kubernetes anyway(due to the joke that is Kubernetes "secrets"), and Consul provides some very nice features and is quite popular itself, that's okay IMHO. Unix philosophy and all.


This is just false. I've run Vault in an Enterprise and unless something has changed in the last 12 months, Hashicorp's recommendation for Vaul has been 1 Consul cluster for Vault's data store, and 1 for it's (and other application's) service discovery.

Sure Kubernetes's secrets is a joke by default, it's easily substituted by something that one actually considers a secret store.


https://www.vaultproject.io/docs/configuration/storage/raft

It's new but I think is quickly becoming preferred. I found trying to setup nomad/consul/vault as described on the hashi docs creates some circular dependencies tbh (e.g. the steps to setup nomad reference a consul setup, the steps for vault mention nomad integration, but there's no clear path outside the dev server examples of getting there without reading ALL the docs/references). There's little good docs in the way of bootstrapping everything 1 shot from scratch in the way most Kubernetes bootstrapping tools do.

Setting up an HA Vault/Consul/Nomad setup from scratch isn't crazy, but I'd say it's comparable level to bootstrapping k8s in many ways.


Cool, so that's certainly new. But even then, you're dealing with the Raft protocol. The different is it's built into Nomad compared to Kubernetes where it's a separate service. I just don't see Nomad and Co being that much easier to run, if at all.

I think Nomad's biggest selling point is that it can run more than just containers. I'm still not convince that's it's much better. At best it's equal.


> you're dealing with the Raft protocol. The different is it's built into Nomad compared to Kubernetes where it's a separate service

I don't really follow this. etcd uses raft for consensus, yes, and it's built in. Kubernetes components don't use raft across independent services. Etcd is the only component that requires consensus through raft. In hashi stack, vault and nomad (at least) both require consensus through raft. So the effect is much bigger in that sense.

> I think Nomad's biggest selling point is that it can run more than just containers. I'm still not convince that's it's much better. At best it's equal.

Totally agree. The driver model was very forward looking compared to k8s. CRDs help, but it's putting a square peg in a round hole when you want to swap out Pods/containers.


It's not that circular - you start with Consul, add Vault and then Nomad, clustering them through Consul and configuring Nomad to use Vault and Consul for secrets and KV/SD respectively. And of course it can be done incrementally ( you can deploy Nomad without pointing it to Consul or Vault, and just adding that configuration later).


I don't mean a literal circular dependency. I mean the documentation doesn't clearly articulate how to get to having all 3x in a production ready configuration without bouncing around and piecing it together yourself.

For example, you mention starting with consul. But here's a doc on using Vault to bootstrap the Consul CA and server certificates: https://learn.hashicorp.com/tutorials/consul/vault-pki-consu...

So I need vault first. Which, oops, the recommended storage until recently for that was Consul. So you need to decide how you're going to bootstrap.

Vault's integrated Raft storage makes this a lot nicer, because you can start there and bootstrap Consul and Nomad after, and rely on Vault for production secret management, if you desire.


> This is just false.

No it isn’t.

> I've run Vault in an Enterprise

At this point I am starting to doubt that claim.


It has been longer than 12 months that Vault has had integrated storage.


Kubernetes native secrets management is not very good, so you're going to end up using Vault either way.


Also, Kubernetes can be just a single binary if you use k0s or k3s. And if you don't want to run it yourself you can use a managed k8s from AWS, Google, Digital Ocean, Oracle...


> Both Nomad and Vault need Consul for Enterprise set ups, of which Vault needs 2 Consul clusters for Enterprise setups. So now you have 3 separate Consul clusters, a Vault cluster, and a Nomad cluster.

This is incorrect. You don’t need consul for enterprise. Vault doesn’t need two consul clusters (it doesn’t need consul at all, if you don’t want it)


That surprises me. Does Google have a more complete secrets-management system for its in-house services?


IIUC, despite K8s having started at Google by Go enthusiasts who had good knowledge of borg, the goal has never been to write a borg clone, even less a replacement for borg.

And after so many years of independent development, I see no reason to believe that K8s ressemble borg any more than superficially.

This seems to be very much assumed by kubernetes authors. Current borg users please correct me if I'm wrong.


Thanks.


You gained the suffering of dealing with split-brains in Consul and Vault ;-)


Kubernetes has been a single binary with hyperkube for over 5 years. This argument is really tiring.


Which is which?


I believe that the one that requires containers is Kubernetes. Nomad doesn't require containers, it has a number of execution backends, some of which are container engines, some of which aren't.

Nomad is the single binary one, however this is a little disingenuous as Nomad alone has far fewer features than Kubernetes. You would need to install Nomad+Consul+Vault to match the featureset of Kubernetes, at which point there is less of a difference. Notwithstanding that, Kubernetes is very much harder to install on bare metal than Nomad, and realistically almost everyone without a dedicated operations team using Kubernetes does so via a managed Kubernetes service from a cloud provider.


From parent's comment:

k8s = 10x the components & difficult to install.

Nomad = single binary, works with but doesn't require containers.


k0s is a single binary.


I've been running a production-grade Nomad cluster on Hetzner for the past 1 1/2 years and it's fantastic. It was amazingly easy to set up compared to Kubernetes (which I also did), the UI is awesome, updates haven't broken anything yet (as long as you follow the changelog) and it's stable. I really like the separation of concerns the HashiStack offers. You can start out just using Consul for your service meshing, and then add Nomad + Vault to get a similar experience to Kubernetes.

Yes, it doesn't cover as many features as Kubernetes, but it should be good enough for most software and you can still make the switch later. I would never go back.

You can read more on our blog if you're interested: https://pirsch.io/blog/techstack/


Hi, I see you mention the tiniest nodes in Hetzner there, whereas the Nomad documentation [0] talks about 3-5 server nodes in the 2-digit GiB memory range, which is what has kept me from trying Nomad as I find it insane. How much truth is there in the docs?

[0] https://www.nomadproject.io/docs/install/production/requirem...


This very much depends on your workload, number of jobs and complexity of scheduling. Our Nomad servers have 4GB of memory in the VM and are using about 0.5 - 1G at a low three-digit number of jobs.

Hashicorp is doing a smart but normal thing for on-prem sizing there - they are recommending specs which ensure you have no problems for a lot of workload sizes. And "workload" can grow very large there, since a single DC can handle up to 5k clients and such.


I think the minimum number of nodes is high because they are recommending the minimum requirements for a fault tolerant setup. It is entirely possible to install either k8s or nomad on a single node, but clearly that is not a fault tolerant situation.

IIRC both k8s and nomad rely on the Raft algorithm for consensus, and so both of them inherit the requirement of a minimum of 3 nodes from that algorithm.

If you want something to run some containers in the cloud, and you aren't concerned by the occasional failure, then there is no issue running a single Nomad (or k8s) on a single machine and later adding more if and when you require it.


Yes, I did not even consider that angle.

And yes, nomad recommends 3 - 9 nodes in prod, maybe 11 or 13. The 3 are necessary to be able to tolerate one broken/maintained node and maintain the ability to schedule workloads.

You can increase the number of tolerated nodes by increasing the number of nodes - 5 tolerate 2, 9 tolerate 3, 11 tolerate 5, 13 tolerate 6, but raft becomes slower with more nodes, because raft is a synchronous protocol.

However, it is entirely possible to run a single node nomad system, if the downtime in scheduling is fine.


PSA: In practice, choose 3 or 5 nodes. Going higher exposes system to increased communications overhead. See also: all of the literature on this.


I am currently running over 13000 nodes with a 5 server cluster. These are large boxes (90gb memory, 64 cores) but cpu usage is under 20% and memory usage is under 30%.


I'm sorry, what is the exact benefit of running multiple nodes on one piece of hardware? Just software failure resilience?


I assume they meant a 5 server control plane supporting 13k worker nodes, not that they partitioned 5 large hosts into 13k smaller ones. It's a counterpoint to GP's "raft gets slow with more servers", I think.


This is correct.

We run a Raft cluster of 5 voting members with 13,000 physical Nomad nodes mixed across OS's with a bunch of random workloads using docker, exec, raw_exec, java, and some in-house drivers. I'll clarify that (and I can't... because I can't edit my post anymore :( ).


I don't see where you see multiple nodes. I'd read the parent post as: There are 5 identically sized, beefy VMs. Each of these 5 beefy VMs run 1 instance of the nomad server. And then, there is 13k other VMs or physical systems, which run nomad in client mode. These clients connect to the server to register and get allocated allocations / task groups / practically containers or vms or binaries to run.

We're nowhere near that size, but the architecture is probably identical.


Sorry for the delayed response.

We started out with a three-node cluster using the smallest (1 vCPU/2GB RAM) VM you can get. Our initial requirement was to be able to do zero-downtime deployments and have a nice web UI to perform them (alongside the other advantages you get from a distributed system). We have now rescaled them to the next tier (2 vCPU/4GB RAM).

The hardware requirements depend on your workload. We process about 200 requests/sec right now and 600-700 per second on Saturdays (due to a larger client) and the nodes handle this load perfectly fine. All our services are written in Go and there is a single central Redis instance caching sessions.

Our database server is not part of the cluster and has 32 physical cores (64 threads) and 256GB RAM.

I say start out small and scale as you go and try to estimate the RAM usage of your workloads. The HashiStack itself basically runs on a calculator.


For ~20 jobs in my local network for the longest time i was running single VM with 2cpu 1gb ram. Opening htop was taking more resources than rescheduling a job. For local development just use `nomad agent -dev`.


More anecdata:

It, of course, depends entirely on your use case and workload.

I have a few dozen jobs running with what I think is a pretty decent mix/coverage. I've got a bunch of services, some system jobs, half dozen scheduled jobs, etc, etc.

I'm running my cluster with 3x "master" nodes which act as the control plane for nomad and consul as well as run vault and fabio (for ingress/load balancing). I have three worker nodes running against this that are running consul and nomad agents as well as my workloads.

The control plane nodes are currently sitting on AWS's t3.nano instances. `free` shows 462MB total RAM with 100MB available. Load average and AWS monitoring show CPU usage within a rounding error of 0%.

If this were in a professional capacity I probably wouldn't run it this close to the wire, but for personal use this is fine--it will be quite easy to upgrade the control plane when or if the time comes.


Their (stated) minimum requirements are what stopped me from using it for projects too. My projects were pretty simple, so it was total overkill, but for any bigger projects, it was too risky to try something new. So I ended up never trying Nomad out, even though it always looked really nice.


I'm really curious about the local development story for Nomad.

I'm using Docker Swarm in production, which let's me easily spin up the same services locally as I do for production - it's really nice to have one set of yaml files that can be used across all environments, and of course it gives more confidence before shipping to prod.

How do you handle local development? Do you have Docker Compose files specifically for that, or something else?


Sorry, didn't see your reply because... well HN has no notification thing I think? Anyways, we don't run Nomad locally. All of our services can be started locally without additional tools other than Go.


I've been getting into Ansible and Terraform lately. It seems like Nomad would fit better in an environment where you weren't going to be putting everything in K8s, so a mixture where you have Consul and Vault handling standard VMs alongside your Nomad cluster would make a lot of sense.


Great read, thank you.


I'm glad you like it :)


If someone's interested, i wrote a deeper dive into the matter a few months back, with concrete examples:

https://atodorov.me/2021/02/27/why-you-should-take-a-look-at...


Thank you for that post, I found it a couple of months ago and helped a lot in making my decision to go with Nomad over Kubernetes!


nicely covered.


> Flexible Workload Support

This is Nomad's most underrated feature, IMHO. You don't have to use containers for everything if you don't want to. For example, if you're a golang shop you can run everything as native binaries and cut out docker completely.

Nomad has much simpler networking, i.e. no web of iptables rules to figure out. You can add Consul connect as a service mesh if you need it, but if you don't, you can keep things very simple. Simple = easy to understand, run, and debug.

The main downside for me is a lack of plug and play pieces, e.g. a helm chart for Grafana or Prometheus. You'll have to write your own job for those, though it's very easy to learn. I'd love to see a series of Nomad recipes that people could use.

I think it's the ideal choice for on-prem, bare-metal, or 'weird' deployments where you need a bit more control. You can build the exact stack you need with the different HashiCorp projects with minimal overhead.

I can't recommend it enough! I help people move to Nomad, my email is in my profile if you want to chat :)


> For example, if you're a golang shop you can run everything as native binaries and cut out docker completely.

Ironically you can also just deploy a go executable into an empty Docker container and it's basically the same as the raw executable, but with all the config abstracted to be the same as other containers'.


Good point! Although Docker does add a networking layer on top. I'd prefer to run something like HAProxy without Docker if possible.


You can run it on the host network as well.


Adding the networking layer is a good thing since it's centrally managed


Flexible workload support doesn't sound great to me.

Good tools have very specific use-cases so as soon as you start talking about IIS on windows, VMs, containers etc. it just sounds like lots of the dependencies you are trying to remove in a scalable system.

Containers are relatively restrictive but they also enforce good discipline and isolation of problems which is a good idea imho. I would not want to continue to support our Windows servers running IIS, just too much to sort out.


I understand what you're saying. Fewer runtimes = fewer problems.

It's not going to force your hand. You can disable all the task drivers except the docker driver if you want a container-only cluster. The drivers themselves are lightweight.

In an ideal world, every company is two years old and only produces software in perfect docker containers, but in reality there's always some service that doesn't work in a container but could benefit from job scheduling.

I think it's great that we can add scheduling to different runtimes. Some folks want or need to use those different runtimes, and I like that Nomad lets you do that.


The question often isn't what we as individuals want, though. It's what's good for the business given operational, budgetary, and personnel constraints.

Many people still have mission-critical Windows applications. Windows has nominal container support, but the footprint of a basic Windows container image is extremely high and the limitations of how you can use them are pretty onerous.


> The main downside for me is a lack of plug and play pieces,

Hashicorp just announced Nomad Packs filling that precise niche. Still in beta, and it was a long time coming but IMHO it was the main thing missing and is honestly awesome.


Here's the repo link in case you need it for Nomad Pack. https://github.com/hashicorp/nomad-pack


I looked into Nomad a whole back, and finding good examples was a problem - Packs looks fantastic!


This has been my biggest pain point - world+dog shares their helm charts, but blog posts or git*.com repos with non-trivial cluster recipes are rare.

f.e. We're exploring cortexmetrics currently, and spinning the full stack version up on k8s (openshift) was straightforward. Porting all that spaghetti onto nomad would be a huge job, though part of the frustration is knowing someone, tucked away on a private network, has already done this.


> the frustration is knowing someone, tucked away on a private network, has already done this.

Hard agree. I know this person and have been this person before.

I've toyed with the idea of writing a book of Nomad recipes and tips, I wonder if anyone would read it?

Also, watch this space, helm for Nomad may be coming soon: https://github.com/hashicorp/nomad-pack


A book - probably not so much.

The big value of sites (mostly github master repos) that offer recipes for saltstack/chef/ansible/puppet, or the helm collective, etc - are that they're continually being tweaked as new versions of (upstream) software are released.

They usually all require a fair bit of localisation before they 'just work', at least in my experience, but the template taken from a proven functioning system, and then abstracted & shared, is worth its weight in gold.

I've set up a few nomad jobs, but nothing anywhere as complex as, say, this cortexmetrics monstrosity. Even our k8s & nomad guru baulks at such an undertaking.


I liked Mesos when I worked on it, and it's been replaced by more modern tools like Nomad, but every time I have to work on k8s it's .. well it's being promoted like a cult, has a cult following and I think the whole thing is set up to suck up complexity and layering.

How do you deploy a thing to run on k8s?

One would think you deploy a manifest to it and that's it. Like yaml or json or hcl or whatever.

No. The built in thing is not good so someone wrote Helm.

So you deploy helm manifests on there?

No. Helm is not good either, you need helmsman or helmfile to further abstract things.

How do you do networking?

Layer a service mesh with custom overlay networking that abstract upon k8s clusterip.

jeesh. why?


> How do you deploy a thing to run on k8s?

kubectl apply -f ~/git/infra/secretproject/prod.{json,yaml}

One JSON/YAML file too uwieldy? Generate it using jsonnet/CUE/dhall/your favourite programming language. Or just talk directly to the Kubernetes API. You don't have to use Helm - in fact, you probably shouldn't be using Helm (as the whole idea of text templating YAML is... thoroughly ignorant in understanding what Kubernetes actually is).

> Layer a service mesh with custom overlay networking that abstract upon k8s clusterip.

You don't have to, and you probably don't need to. I'm happily running bare metal k8s production without anything more than Calico, Metallb and nginx-ingress-controller. That effectively gives me 'standard Kubernetes networking' ie. working Services and Ingresses which is plenty enough. And if you're using a cluster deployed by a cloud provider, you don't have to worry about any of this, all the hard decisions are made and the components are deployed and working.


> kubectl apply -f ~/git/infra/secretproject/prod.{json,yaml}

This is fine for the first deploy, but if you delete a resource from your manifests then kubectl doesn’t try to delete it from the cluster, so in practice you need something like Terraform (or perhaps ArgoCD) which actually tracks state. And of course as you mention, you probably want to generate these manifests to DRY up your YAML so you can actually maintain the thing.

I would love to hear from others how they solve these problems.

> as the whole idea of text templating YAML is... thoroughly ignorant

1000%


I've been using the following command to great effect:

kubectl apply --kustomize manifests/ --prune --selector app=myapp

It cleans up old stuff from the cluster and also allows you to split your manifests across multiple files.


Whoa. I never knew about --prune. <mind-blown emoji>


Yeah... I'm not really sure where this idea of not being able to use the tools available out of the box to deploy apps and do networking comes from. I deploy YAML. If I feel like my YAML is too big, I DRY it using kustomize. I can use --prune if I'm worried about stuff sticking around in the cluster. For networking, I... don't do anything? We get DNS built in. Just use the service name. What else is there to do?


External DNS, certificate management, and a whole bunch of other stuff if you're not using a cloud provider's managed Kubernetes (e.g., network attached storage, load balancers, ingress controller, etc).


Label selectors are hard. They might require knowledge as advanced as high-school geometry to understand. The ability to draw a Venn diagram isn't free, you know!


Before there was Helm, I wrote my own tool and framework in Ruby (called Matsuri) to generate and manage manifests. I still use it. However, the source of truth is still on the kube api server. I did write in functionality to diff what is in the git repo and what is in the source of truth. There is an additional abstraction that bundles together resources so that a single command can converge them all together. The underlying mechanism is still kubectl, and the code merely generates the manifests to send it to kubectl via stdin.

I did not think about a “delete”, and I usually don’t want it to be automatically deleted. But that’s a great idea for that convergence mechanism if I have something that explicitly adds a deleted resource line to make sure it stays deleted until otherwise specified.

The Ruby code can access anything Ruby to generate the manifests, so I have a Turing-complete language and I can use class inheritance, mixins, method overrides. I was also able to write a suite of helpers to take a raw yaml manifest from an upstream project, and use a functional programming style to transform the manifest to something else. If I ever have time, I’d write the ability to import templates from Helm.

This was written for a one-person devops role, intended to be able to manage almost-similar things across different environments (dev, staging, prod), which works great for small, early stage startups, for our level of complexity. At this level, I don’t need it to track state.

When our team size grows, I’ll have to think up of some other patterns to make it easier for multiple people to work on this.

The other thing is that for much more complex things, I would just write operator code using Elixir, and bootstrap that into k8s with the Ruby framework.


> Before there was Helm, I wrote my own tool and framework in Ruby (called Matsuri) to generate and manage manifests.

I've built these kinds of things too in Python (or Starlark). It's an improvement over YAML, but I often feel the pain of dynamic typing.

> The Ruby code can access anything Ruby to generate the manifests, so I have a Turing-complete language and I can use class inheritance, mixins, method overrides.

I actually don't want any of these things in a dynamic configuration language. I kind of just want something that can evaluate expressions (including functions, objects, arrays, scalars, and pattern matching). If it's Turing complete that's fine but unnecessary. I explicitly don't want I/O (not in your list) or inheritance, mixins, method overrides, etc--unnecessary complexity.

Dhall probably comes the closest, but it has a tragic syntax and I'm not a good enough salesman to persuade real software teams to learn it. I've also heard good things about Cue, but my experiences playing with it have only been painful (steep learning curve, don't really see the upsides).


I don't really see this tooling as a dynamic configuration language so much as a manifest generator. That means I also have tooling for diffing, debugging, and converging manifests. If you are just focused on configuration, I can see why Turing-completeness will seem like unnessary complexity.

I use classes and mixins to be able to generate similar manifests with slight differences across different clusters or environments. I sometimes use imports (I/O) for manifests provided by an upstream (such as from AWS docs), do transforms to get manifest I want.

I modelled the design off of Chef, which will also declaritively define and converge a set of systems towards desired state. Well-known paths and conventions helps keep things organized.

I havn't had a problem with dynamic typing, but I have also been using Ruby in application development for over 10 years. You might see it as unneeded complexity, but I have been able to use the flexibility for years now. This tooling was designed for a team that uses Ruby as the primary language, and uses designs and idom that would be familiar for a Ruby dev team. It is definitely opinionated and I don't expect it to be universally useful for everyone.


You can use -prune flag to apply to delete resources removed from your configs.


Which is still in alpha


> kubectl apply -f ~/git/infra/secretproject/prod.{json,yaml} > > One JSON/YAML file too uwieldy? Generate it using jsonnet/CUE/dhall/your favourite programming language. Or just talk directly to the Kubernetes API. You don't have to use Helm - in fact, you probably shouldn't be using Helm (as the whole idea of text templating YAML is... thoroughly ignorant in understanding what Kubernetes actually is).

Agreed 99%, but there is one useful thing that Helm provides over Kubectl+structured templating (Nix/dhall/whatever): pruning objects that are no longer defined in your manifest.

However, that's solvable without taking on all of Helm's complexity. For example, Thruster[0] (disclaimer: an old prototype of mine) provides a pruning variant of kubectl apply.

[0]: https://gitlab.com/teozkr/thruster


I really struggle with seeing the value add of helm. I use it, but mostly with a "less is more" approach. It is semi handy in a dev/staging/prod sort of environment, but really not all that much.

What I don't get is some people in my company thought it was a good idea to make a "docker helm chart"... That is, a helm chart capable of deploying arbitrary containers, ingresses, etc... Like, a true WTF, since the values file ends up looking awfully similar to a k8s manifest :D.


> the whole idea of text templating YAML is... thoroughly ignorant in understanding what Kubernetes actually is

Could you expand on that? It sounds like an interesting position (bordering on the philosophical), but I don't know enough about Kubernetes to gauge its accuracy for myself.


There's two things:

1) Text templating YAML is just bad. It's the serialized format of some structured data - instead of templating its text representation (and dealing with quoting, nindent, stringly-typed variables, and general YAML badness), just manipulate the structures directly before serializing them. For example, write some Python code that composes these structures in memory and then just serializes them to YAML before passing it over to kubectl. You can also use ready made tooling that uses domain-specific languages well suited to manipulating and emitting such structured configuration, like jsonnet/kubecfg (or CUE/Dhall/Nix combined with kubectl apply -f). Eschewing text templating is not only more productive by letting you build abstractions (eg. to deal with Kuberenetes' verbosity with labels/selectors, etc.), it also allows you to actually compose more complex deployments together, like 'this wordpress library uses this mariadb library' or 'this mariadb library can take a list of objects that will be upserted as users on startup'.

2) Those YAML/JSON manifests aren't even that native to Kubernetes. A lot of things go behind the scenes to actually upsert a manifest, as Kubernetes' resource/object model isn't nearly as straightforward as 'apply this YAML document full of stuff' would indicate (there's state to mix in, API/schema changes, optional/default fields, changes/annotations by other systems...). With k8s' Server-Side-Apply this can now be fairly transparent and you can pretend this is the case, but earlier tooling definitely had to be smarter in order to apply changes. Things like doing structure-level diffs between the previously applied intent and the current intent and the current state in order to build a set of mutations to apply. What this means is that Helm's entire 'serialized as YAML, manipulated at text level' stage is not only harmful and a pain in the neck to work with (see 1.) but also unnecessary (as 'flat YAML file' isn't any kind of canonical, ready-to-use representation that Helm had to use).


That is a very helpful summary. I generally favor declarative configuration (like CloudFormation) and I think it's mostly due to my work being infrastructure focused: "I need VPC, LB's, ASG with those immutable Images, a queue, buckets etc". But in my recent work with a customer where the infrastructure is "EKS with datastores, and Gitlab CI" .. most of the complexity is people creating abstractions on top of abstractions of stuff in helm (and also .gitlab-ci.yaml with tons of severy nested includes). And in this case the text templated yaml is really painful. Something that would be like CDK for k8s could actually be amazingly useful. Lots to ponder, thank you.


> Something that would be like CDK for k8s could actually be amazingly useful.

Such thing surely does exist: https://cdk8s.io. And here's the relevant announcement blog post: https://aws.amazon.com/blogs/containers/introducing-cdk-for-....


The tool that you want is probably Jsonnet. Take a look at Tanka or one of the other Jsonnet options. There is also Dhaal and a few others but Jsonnet has a lot more traction.


> Could you expand on that? It sounds like an interesting position (bordering on the philosophical)

I'll make the philosophical case: text-level templating of a computer-readable format is almost always the wrong approach. It becomes unnecessarily hard for someone who later needs to read the code to ensure that it generates valid (both syntax and semantics) markup. It's also more hard for tooling to verify the validity, because the syntactic validity may depend on inputs to the program.

Compare the approaches of PHP and JSX. They both accomplish a similar goal (interleaved HTML and control flow), but JSX works at the element tree level and makes it impossible to generate poorly-formed HTML syntax -- it becomes a compile-time error (though it doesn't ensure semantics, e.g. a tbody outside of a table is legal). Compare with PHP, which very much allows you to generate invalid HTML, because it's just a text templating language.

(From what I can tell, Helm works more like PHP; if I'm wrong my philosophical position stands but might not apply here)


k8s is built for extension with validation of the extensions you add ( https://kubernetes.io/docs/concepts/extend-kubernetes/api-ex... )

Helm is just sort of dumb text manipulation with a TINY bit of deployment management built on top of it. There isn't really a whole lot that helm buys over extending k8s.


I’m a big fan of kapp[1] for this sort of thing, it’s basically the same as kubectl apply, but with an app name and state tracking .

    kapp deploy -a my-app -f ./examples/simple-app-example/config-1.yml
It was created by folks at my employer but I use it because I just like it’s simplicity.

[1] https://carvel.dev/kapp/


> in fact, you probably shouldn't be using Helm

Funny that the thing you shouldn't use is what the entire Kubernetes ecosystem uses for deployment. It's almost like there's no good way to do it.


Except it's... Not what the entire ecosystem uses. Yes, it's popular because it made for "app store"/"package repository" style operation, but I have yet to actually use it for longer than a few weeks despite running k8s projects across multiple companies since 2016.


You don't need any of that stuff. When I started with k8s, all I used were Yaml manifest files and kubectl apply.

As I started using it more, i eventually moved up to using helm. I've been running production k8s for a few years now and haven't used helmsman or anything but helm yet.


> When I started with k8s

But that's the problem right there. Shop after shop has invested in kubernetes with a team that was just learning it. And then they layered tool after tool to help massage pain points -- of which kubernetes has plenty. That leaves us at today where every shop effectively has a custom stack of kubernetes infrastructure cobbled together of various tools and philosophies.

It's the same problem as JavaScript. There's nothing terrible about the technology -- on the contrary it's amazing stuff! But the context of how it gets used leads to a type of gridlock of opinionated tooling published just last week.


At its core, Kubernetes is a controller manager. It's true that most Kubernetes systems include a lot of the Hashicorp equivalents but you could in theory remove the controllers and associated CRDs. Kubernetes has gradually moved to what I think is a relatively sensible set of defaults which you can get in different flavors (k0s, k3s, microkube, kubespray, etc.)

The comment about development and packaging tools can indeed present a problem. I tend to favor Helm for applications that need to be deployed more than once and Kustomize to differentiate between environments but I've definitely seen what I would consider horror stories. Even if you add a bit of GitOps, it's not too hard to understand. The horror stories seem to occur when teams try to map their previous workflows directly to k8s. Many times the worst k8s development workflows are created to enable teams that had broken workflows without k8s.


> How do you deploy a thing to run on k8s?

    kubectrl apply -f deploy.yaml
should work, no? What forces you to use the sugar-coating?


Maybe bc no one uses that in reality and use helm instead?


That's a very odd and factually wrong generalisation.

I don't use helm at all, and I manage a large scale platform built on Kubernetes. Everything is either declared directly in YAMLs and deployed with `kubectl apply -f <YAMLs or directory containing them>`, or rendered using Kustomize and, again, deployed using `kubectl apply -f -`.

Kustomize can be rough around the edges but it's predictable and I can easily render the YAMLs locally for troubleshooting and investigating.


Helm is useful if you need your software to run in many different places, and is widely known. This is why you see so many projects offering Helm charts; because you see them, they are set up to run in many environments.

There is no reason to use it for your own software if you just have a single cluster.


Helms a pile of garbage but this isn’t really the fault of Helm. This is an issue with the chart or a failure to read the documentation of the chart.

People have got to stop just blindly running stuff off the internet.


At a previous employer, they’re building k8s clusters not for developers but for their infrastructure teams. In the past, where a vendor might have supplied an OVF file as the distributable product, they’re now providing Helm charts.


The first time I used helm was to set up JenkinsCI on a k8s cluster on AWS, and in the default configuration, it setup a public-internet ELB listener (with type=LoadBalancer) for Jenkins' internal JNLP port. Which pretty much means the public internet has root access to your jenkins cluster, by default.

I had crypto miners using my k8s cluster within a couple of hours.

That was also the last time I used helm.


You mean the helm-package you installed without reading the documentation? Hate the player, not the game.


No. No, no no.

This was legitimate a bug that they immediately fixed when I reported it... There is no legitimate reason to expose the master JNLP port to the internet, ever. The chart did not have a configuration option for this, it just exposed the JNLP port as part of the same k8s Service that exposed the web port. (The JNLP port is for jenkins workers to phone back home to the main instance, it's not for external use.)

"Just read the docs" is not an answer to a chart setup which is just plain broken.


I'll take that "no one" badge. I've never used Helm, always used kubectl.

Well, until now, when I'm using Terraform (particularly helpful relative to kubectl when I need to provision cloud resources directly), but I've still never used Helm.


Also, there is the middle way: kustomize. It's built into kubectl


I have yet to work in any environment where helm was used heavily and not just for few okne offs setup long ago and that now bring regrets.


That’s an organisational problem, not Kubernetes problem.


Some parts of Kubernetes are perhaps unnecessary complex and we should keep our eyes open for alternatives and learn from different approaches, but deploying to Kubernetes really does not have to be that difficult unless you make it so.

Plain manifests in a Git repository will do, and let something like Flux apply them for consistency. It really isn't harder than SSH and ad-hoc scripts or Ansible.


The worst part of k8s is definitely its configuration system. I don’t like helm as it’s combining packaging, templating and deployment lifecycle. I feel like each of this components should’ve been its own abstraction.


It should've been, and it can be. There's nothing official or special about Helm, just don't use it.


So just don't use Helm. It's not a core, crucial, or required part of k8s, it's just a reasonably popular related project... That I usually ban from environments before the rot can take root.


what are the alternatives?


>> jeesh. why?

If you look at Google they build a whole PaaS (Cloud Run/Appengine) on top of k8 and I guess that's the way it's meant to be used.

IaaS -> K8 -> PaaS -> deploy.sh


AppEngine does not run on K8s. Cloud Run _can_ run on GKE with Cloud Run for Anthos or oss kNative


Nothing about K8s forces you to use Helm or istio etc.


No, the developers about k8s force you to use Helm, istio, or etc. And then tech debt makes the decision unchangeable.


YMMV, I guess.


Because you have different team members managing different aspects of your software's deployment (if you have complex systems that you have moved into kubernetes). You can have a team for: authnz/rbac, networking, containerization/virtualization, service load balancing, app development, and secrets all while no one is stepping on each others toes deploying changes while using common language to talk about resources and automation.


> How do you do networking? Layer a service mesh with custom overlay networking that abstract upon k8s clusterip.

Overlay network is easy if you don't need your pod and service IP to be routable from outside the cluster. It took me less than 1 afternoon to learn to use Kubespray to deploy an on-premise cluster with Calico for overlay network.


Zero snark, but you confirm parent somehow IMHO. Two more tools.


Well, Kubespray is just an Ansible-based installer that almost walk you through the process of setting up a K8S cluster. And Calico is the default option in Kubespray, so unless you want more customization, you don't need to learn anything about it.


At some point you need to take a few minutes to learn new technologies in a tech career. Kubernetes is just not that hard if you spend a little time understanding it. You don't need a service mesh. Calico is very simple, or you can use cilium for something more advanced


> but every time I have to work on k8s it's .. well it's being promoted like a cult, has a cult following

I would call this grass roots marketing. Some companies are very good at it. Maybe that ones have more know-how in the ad space than others?


> jeesh. why?

Poor design. It's been cobbled together slowly over the past 7 years. They built it to work only a certain way at first. But then somebody said, "oh, actually we forgot we need this other component too". But it didn't make sense with the current design. So they'd strap the new feature to the side with duct tape. Then they'd need another feature, and strap something else to the first strapped-on thing. Teams would ask for a new feature, and it would be shimmed in to the old design. All the while the design was led by an initial set of "opinionated" decisions that were inherently limiting. The end result is a mess that needs more and more abstraction. But this isn't a problem when you are already a billion-dollar company that can pay for other teams to build more abstractions and internal tools to deal with it.

This is business as usual in any quasi-monolithic Enterprise project. Typically the way you escape that scenario is to have major refactors where you can overhaul the design and turn your 50 abstractions into 5. But instead, K8s decided to have very small release and support windows. This way they can refactor things and obsolete features within a short time frame. The end result is you either stick with one version for eternity (a security/operability problem) or you will be upgrading and deal with breaking changes every year for eternity.


So much flamebait. These kinds of comments aren’t good for anyone.


Nomad is amazing. We've been using it alongside Consul for close to 2 years at this point at Monitoro.co[0].

We started by using it as a "systemd with a REST API" on a single server, and gradually evolved into multiple clusters with dozens of nodes each.

It has been mostly a smooth ride, even if bumpy at moments, but operationally speaking Nomad is closer to Docker Swarm in simplicity and Kubernetes in terms of the feature set.

We didn't find ourselves needing K8s as we're also leveraging components from the cloud provider to complete our infrastructure.

[0]: https://monitoro.co


> Nomad is closer to Docker Swarm in simplicity and Kubernetes in terms of the feature set.

This a question I still need to google but what features does Kubernetes have that Docker Swarm needs?

Because the perceived complexity of Kubernetes just blows my mind, where Docker Swarm seems a lot more simpler for the same benefits, but my its just abstracted away?

I will say upfront im naive when it comes to container tech.


Swarm has a lot of issues. Some that are on the surface like bad networking and stemming from it scaling issues. Others are related directly to Mirantis. Company that owns docker.inc now. It neglects the swarm part of the docker, even was planning to straight up sunset swarm and move everyone to k8s. They do maintenance, and add feature or two a year which is not enough. Swarm is great for small deployments, as it only requires a docker daemon present. Otherwise you should look towards nomad/k8s/whatewer the cloud solution is.


I have never deployed or built a K8s cluster, but recently I moved about a dozen AWS ECS Fargate workloads into a K8s cluster that a colleague of mine has setup. I was surprised. I really like it (from the perspective of a user/developer). I deploy my apps by updating a simple yaml file in a git repo. All my other practices (vim, go code with standard Makefiles, docker container image registry) are unchanged.

I also think K8s is a reasonable cloud independent abstraction when you need to move your workloads to another provider. It prevents cloud lock-in. And I suppose Nomad would do that too.

So far my K8s experience has been very good. However, if I ever have to do more with it (on the administrative side) I may have a less positive experience. Not sure.


While interesting in itself, I'm not sure how this is relevant to this Nomad blogpost, unless you tried Nomad before starting to use Kubernetes and went with Kubernetes anyways.


What was wrong with ECS Fargate if you don't mind me asking? Too expensive? Too vendor locked-in?


We want to apply consistent policies and technical controls across multiple dev groups that have multiple AWS accounts. K8s seems to be a good solution for that.


Was it not possible in ECS? Or did your teams just want to use K8s, and decided they would use the opportunity of wanting to organize things better as a reason to switch? It seems to me that with SSO and IAM you could create virtually any kind of access controls around ECS. K8s doesn't solve the multi-account problem, and federation in K8s can be quite difficult.


I'm not a fan of AWS but to be fair it has all those features OOTB.

You can have AWS "organizations" managing multiple accounts. IAM will allow to set up any policies you want in your organization.

On the other hands side now you have a multitude of complexity added to your setup…

If I would be snarky I would see a case of "resume driven development". But let's just assume it was lack of time to find the simplest solution and it's than often preferred to just go with the flock as so many others can't be possibly wrong.


All right, thanks! Just wondered as I have a ECS setup and I really prefer not having to switch unless I have to.


Not op, but I can't imagine ecs being cheaper than eks since ecs itself is free (you pay for fargate either way)

Ecs is pretty simple but does not have the large mindshare that kubernetes or nomad have.


To most people, it's not Nomad vs Kubernetes - it's a choice between Nomad vs Managed Kubernetes.

All major cloud providers offer a managed kubernetes service at minimal added cost to running the worker VMs yourself.

With managed Kubernetes, the simplicity question is no longer obviously in Nomad's favour. As other comments allude to, Kubernetes as a user is pretty easy to master once you get used to its mental model.


I'm currently trying to convince people that a managed k8s service is not that "simple", and that we can't "just spin up another cluster" without a great deal of operational overhead.

Some of the things that might still be needed in managed k8s instances: better ingress with ingress-nginx, cert-manager, monitoring/logging/alerting, tuning the alerts, integration with company SSO, security hardening.

If it's a multi-tenant cluster: LimitRanges/ResourceQuotas, NetworkPolicies, scripts to manage namespaces and roles, PodSecurityPolicies (or equivalent), onboarding offboarding procedures.

I'm sure you'd need similar things to have a proper production Nomad cluster too, so your point still stands. But at least for EKS/GKE clusters, they're pretty bare-bones.


As someone with 7-digit spend in GKE/EKS, I will agree with you that it is _anything but simple_.

Your developers aren't going to say that it's simple when Google force upgrades their cluster to a version that deprecates APIs in their yamls for a job they worked on 2 years ago and swiftly forgot about.

Then when you explain to them that Google insists on putting everyone on a force-upgrade treadmill, you can literally watch as the panic sets in on the faces of your engineering team managers/leads.

Nomad is a breeze in comparison to managed K8s.

Everyone that I've talked to that thinks Kubernetes is simple is barely using the thing and could likely save a lot of money and development effort using something like Nomad instead.


So instead of using higher level primitives that are widely used and tested and have many simple high quality integrations (external-dns, cert-manager, et al) you would recommend reinventing all of that on Nomad and then calling that "saving a lot of money and development effort"?

Yeah no thanks.

At this point k8s has "won" for all intents and purposes. It has gained critical mass, succeeding where other infrastructure management tools both open and closed source have failed.

Also API upgrades shouldn't be a problem, even if you were using beta APIs. They are only really troublesome if you decided to indulge yourself with some alpha APIs before they were fully baked. If you don't do that then you won't run into any problems.


Thanks, Mr. Well Ackshually,

First of all, those things you're talking about are installables in K8s. They don't come by default. Many people with (managed) kubernetes installations aren't even using them and get by just fine. Having them certainly isn't free and work certainly had to be done to build those for Kubernetes and likely are (or will be) trivial to implement elsewhere. I certainly was able to automate my certificate management infrastructure before I had Kubernetes.

The reality is that there are many companies out there that want like 10% of the features of an orchestrator like Kubernetes and don't need all of those features that make you think that it's a zero-sum game that Kubernetes has won.

The reality is that there are competing offers like Nomad that more-easily accomplish our technology goals and thus more attractive to enterprises with big budgets like mine.

Kuberenetes having the "critical-mass" that you speak of isn't to the exclusion of other competing tools having "critical-mass". Just like Oracle and DB2 aren't the dominant RDBMS of today.

As for calling people indulgent for using beta APIs, let's not forget that Deployment, StatefulSet, DaemonSet, ReplicaSet, NetworkPolicy and PodSecurityPolicy all started as beta APIs and were only removed in 1.16. And I imagine you wanted to use ingresses, but the old ingress.class annotation was deprecated and replaced with IngressClass in 1.18 and that was firm cutover for _everyone_. I can't imagine Kubernetes being very useful to many people without any of these...


There was no "Well Ackshually" in my reply.

I just pointed out that for everything you would need to build custom on Nomad there exists off-the-shelf components that will plug right into k8s. There is no way this would save money or development time, which was my main contention with your assertions.

You basically mischaracterized everything I said and then missed the entire point.

I don't disagree that there are some cases where Nomad would be superior. Off the top of my head if I wanted to build a modern rendering farm and I knew I wouldn't want to use the cluster for anything else then I would consider it over the HPC toolkits I have previously used for that (RIP Grid Engine) or Mesos which used to be king of that space.

The problem is these are incredibly niche cases and for 99.99% of companies they are better off swimming with the flow.

This is what I mean by "k8s has won". It's not that other things won't continue to exist and be created but most of them won't survive unless they commercialise within a niche or find a sufficiently large user that is willing to do the vast majority of the development (eg. Netflix Titan).

It's not a zero sum game but it's damn close. See here the corpses of Docker Swarm, Convox, Flynn ( :( ), original Deis, arguably ECS (it's a zombie at this point let's be honest).

The tide really turned when Azure and AWS were basically forced into offering managed k8s. In AWS case they also had to made deep modifications to VPC and IAM to properly support it. These investments wouldn't have been made unless forced which lends credence to the weight k8s has in the ecosystem. (Worth mentioning there is no hosted Nomad I'm aware of, their enterprise offering is install + support)

k8s is becoming the POSIX of distributed scheduling. The API you use to run diverse workloads over large numbers of machines that is relatively portable between companies. Right now there is still some vendor nonsense going on but over time it will be smoothed out.

The vast majority of distributed applications are going to be built targeting k8s as their runtime API. We can already see this happening but it will only increase over time.

To summarize I think it's fair to say things like "Nomad can sometimes be good if conditions x/y/z are met" but I think it's very dishonest to consider it as a viable competitor to k8s in the general case for most users because of the reasons outlined above.


Thanks for this, I was really starting to think I was the madman.


Agreed. Managing a Nomad cluster alongside a Consul cluster does not require a PHD, but it's also not a walk in the park.

Hopefully Hashicorp will have managed Nomad soon.


> managed kubernetes service at minimal added cost

I dare to disagree. The costs are in fact horrific!

There are orders of magnitude in costs between running your stuff yourself vs the highest level of managed services on the cloud which is usually managed k8s.

(That also explains why there is so much marketing fuss, and push of management, to use k8s: It's by far the most profitable offering for the cloud providers).


It's not. You generally only pay for the master clusters with no per-node costs over whatever instance type you are using and scheduling class (ondemand, reserved, spot, etc).

So not only are the masters relatively cheap, they cost is amortized over the cluster size.

AWS is probably very unhappy about this but they were forced into this by GKE pricing model. Google pushed people to adopt k8s not because they can make a lot of money from people using it but because they can take away a lot of money from AWS and increase workload portability, which benefits Google far more than AWS as they are in second place.

The most expensive managed services are things like RDS, MSK, Elasticache, etc. MSK in particular is pretty egregious with a 80% premium considering they haven't done any additional engineering like RDS.


As a Xoogler it's always seemed weird to me how Kubernetes was compared to Borg. Kubernetes covers a much larger set of things than Borg alone, and I don't necessarily think that's for the better. Being written in a language that isn't well-suited to large projects and refactoring efforts doesn't help either.

Nowadays I don't have use-cases for either, but from playing around with Nomad it felt a lot more "Borg-y" than Kubernetes ever did to me. I rarely hear anything about the Hashicorp alternative for service discovery and such though, so it would be interesting how that compares.


> I rarely hear anything about the Hashicorp alternative for service discovery and such though, so it would be interesting how that compares.

Consul is used for service discovery. Fast, reliable and easily extensible. Yet to have a serious issue with it.


Consul is usually very reliable; when it breaks it can be very painful and mystifying. I've worked on removing it both from internal systems due to outages, and products, based on feedback.


Could you expand a little on your problems with Consul? (I have no experience with it myself).


It's been a while, so I'm fuzzy on some details:

* in a legacy system, a server, perhaps with the leader node, filled up disk space and became unhealthy, the consul agent kept reporting it and itself as healthy, and failover and gossip generally wedged

* in a dev environment, after we replaced some servers in the cluster, the other nodes noted cert changes and refused to work with the new servers

* second-hand, in self-hosted installations, it caused a number of hard-to-troubleshoot outages

* something about circular dependencies and going by "wait _n_ seconds" rather than by healthiness

It was reliable enough that it could gather a really significant blast radius, and it had different gnarly failure modes, so documentation could be irrelevant from case to case.


I don't want to sound bad, but to me all of these sound like a misconfiguration and lack of understanding how consul works tbh. But i don't know the full context so eh, these things happen.


And what is the replacement for it?


We rearchitected. At one workplace, we built and distributed our own service. At another, we shifted to semi-automated more static lists of servers for roles; those servers were much less dynamic.


There are exceptions but most of the time replacing an off the shelf std. solution with something self made looks like NIH syndrome to me.

The exceptions are the few cases where you know you will forever only need some strict set of features and your own solution can provide them by way more simple means than the in comparison "fat" off the shelf solution.

I'm not sure service discovery in a cluster is one of those cases.


etcd


Never heard that someone moved form Consul to etcd. It's always the other way around.


Not OP, but go look at the consul documentation. In fact, just look at "Configuration" page alone: https://www.consul.io/docs/agent/options - it goes FOREVER. Maybe you don't need most of that, but whatever you do need, you're going to have to find it somewhere in there.

And yes, the error messages can be utterly cryptic.

One thing: Don't ever let junior devs try to create a cluster out of their own laptops (yes they will think it sounds like a good idea) as this will be a never-ending nightmare.


Borg is what k8s could be if there were any kind of ground rules and people were willing to call out some use cases as being legitimately stupid. Compared to k8s, Borg is a model of usability and simplicity.


Ground rules as in “you have to compile with 100s of these google3 libraries to run a simple app”?


I don't think that's really true at all. You can pretty easily run any damned thing in Borg. If you want it to do stuff with other Google services then you need a lot of Google code, but if you just want to run 100000 replicas of `/bin/sleep 100000`, Borg makes it easy.


Sure but none of the stuff like authentication and access to any google systems will work. Kubernetes is complex bc it can take damn near any docker container and it can run seamlessly


Because from what I heard, all the other crucial to me benefits of k8s are provided through linking of lots and lots of Google internal libraries?


> Being written in a language that isn't well-suited to large projects and refactoring efforts doesn't help either.

How is Go not suited to those? I'm not seeing it - and are you comparing Go to Java or C++?


Go is a very hard language to work on as a team. It does not push a lot of conventions on you, and when it does, they just seem weird and anachronistic.

Even the original k8s codebase was notoriously horrible in Go, and that was "from the source".

Ironically, HashiCorp has a better Go codebase, and that's where I picked best practices from, not from Google.

The problem with Go is that the Googlers fail to challenge some of the design decisions. The giants of computer science working on it at the company cannot be questioned, and that just leads to a broken feedback process.

As someone said, Go is the new C, skipping decades of new language design concepts.


> The problem with Go is that the Googlers fail to challenge some of the design decisions. The giants of computer science working on it at the company cannot be questioned, and that just leads to a broken feedback process.

Well, they can be questioned but they're not very receptive to it, so most people don't bother. During my time at Alphabet I didn't see much use of Go anyways, other than in glue tooling (including a horrific large-scale incident due to non-obvious behaviour of Go code in such a glue tool).


The original k8s codebase was Java, then rewritten in Go by Java developers. K8s is not a good example of Go, and it has nothing to do with the size of the project.


> As someone said, Go is the new C, skipping decades of new language design concepts.

Well, some are doomed to repeat their errors.

In fact C was even crappy for it's own time.


What are you even talking about? Plenty of languages were around then and C dominated all of them because it let people write great software.


I am sure a few old HNer will tell you, C dominated because of UNIX, and the rest is history.


I am old, and that is not the case (although Unix was one of the reasons).


> Being written in a language that isn't well-suited to large projects and refactoring efforts doesn't help either.

I know that Borg was written in Java and Kubernetes in Go. Though the latter had a reputation in the beginning as a systems programming language, its purpose was actually to build large-scale cloud infrastructure projects with it and it proved formidably well suited for the task. It compiles fast, anyone can read it, good tooling and it is efficient for the layer on which it is meant to be deployed. Go, as Java is one of the most productive languages in use today, judging by the ecosystems they have spawned.


I just wish klog would go die in a fire, I'm so confused at why logging is so terrible in Go.


Can you expand on this?

My experience with Go has just been making small changes to small programs. So, I don't know what the normal experience is.

My experience with logging varies from:

print (works fine I guess)

import logging (this is pretty good - means I don't have to parse my logs before deciding where to send them)

import slf4j (6 logging frameworks and a logging framework disintermediation framework)


I elaborated a bit in my reply above, but basically, multiple logging frameworks with incompatible APIs, few of which offer the fine-grained control the person running the app might need.

But I'm used to the JVM world. And when I first met slf4j, I was like, wtf is this crap, but I appreciate it now as the embodiment of the desire to standardise logging across the Java ecosystem.

While using slf4j does make it trivial to swap out logging frameworks in an app, in my 13 years at my last job, we only did that once, from log4j to Logback, so that's not so important.

But yeah, what I miss from Java land in Go logging is the common approach - loggers and appenders are (usually) configured outside of code, the user can provide their own configuration at runtime to override the config shipped in the jar to troubleshoot issues - especially when you can configure the logging lib to check the conf file every X seconds for changes - allows you to change logger levels on the fly without restarting the app (ditto Logback's JMX configurator).

And lastly, no matter the logging library, configuring them is near identical.


klog is the descendant of glog which is a go implementation of Google's logging. Go has nothing to do with it except it's the logging that k8s took on (originally it used glog).

-- edit typo


Sorry was referring more to the many different incompatible logging libraries in play.

Also the inability in the ones I've used for the person running the app to set a particular logger to a desired level easily.

I think klog can do this with the vmodules flag, if, and only if, the devs used klog.V() for their logging statements.

Logrus requires the dev to allow the user to configure the log level, common idiom I've encountered is doing so via an env var, but IIRC, that applies to all logging in the app, no way to limit it to particular files/modules.

It's been a real pain at times for me. Feels like the Go philosophy on logging focuses more on the dev controlling it, than the person running it.


Borg is C++.


There you Go (pun intended). Go even replaced a low-level language like C++ and achieved the same result in the end. I don't know why I thought it's Java, probably the first Kube was initially in Java. It's even better that they managed to pull that off.


But it did not replace it. Google runs on Borg. Go lets you build things fast, but the lack of strong typing and the vast amount of language pitfalls make maintenance hard in the long run.

The community also has the attitude to pretend that these pitfalls don't actually exist, which is very different from C++ where most peculiar behaviours are well-understood and controlled.


Sorry, I don't think I have been explicit in my comment. The language of K8s, a system which is the so-called direct descendant of Borg, replaced C++ at that layer. Also, Go is strongly typed.


I'd argue that go is statically typed rather than strongly typed due to `interface{}`.

> There is no real agreement on what "strongly typed" means, although the most widely used definition in the professional literature is that in a "strongly typed" language, it is not possible for the programmer to work around the restrictions imposed by the type system. This term is almost always used to describe statically typed languages.

(random definition found googling).


Actually Google knew at the time they "designed" k8s that Borg doesn't scale due to fundamental design flaws in it's basic architecture. Still they reused the exact same architecture for k8s.

Who wants to know the details of those scaling issues can google for the Omega paper.

The "proper"™ solution to those design flaws was implemented in Mesos (and to my knowledge nowhere else until now).


>The "proper"™ solution to those design flaws was implemented in Mesos

And yet somehow Mesos failed... sigh.


Because the market never chooses the most advanced technology.

It chooses the (perceived) cheapest thing with the best marketing. Always.


I've had just as many problems maintaining mature C++ projects as I have maintaining mature Go projects.

Ultimately it all boils down to two things:

- The projects being well written and well maintained from the outset

- Personal preference

I cannot overstate that second point.

It really is about time developers stopped pushing their own personal preferences as if it's some kind of fact.


> It really is about time developers stopped pushing their own personal preferences as if it's some kind of fact.

Agreed, so let's all go to Rust or OCaml because they have very strong static typing systems. ;)

Both C++ and Go have plenty of warts, with C++ having way too many footguns and Go allowing you to use it as a dynamic language whenever you figure it's too much work to go through your problem with static types.


> Agreed, so let's all go to Rust or OCaml because they have very strong static typing systems. ;)

Seems very reasonable. Especially OCaml should get more of the praise it deserves.

If you need to stay on the JVM there's Scala which allows (with some discipline) to write "when it compiles it works" code.


You've very much missed the point of my post. But from the tone of your reply, I sense we'd never see eye to eye anyway.


Which human do you think understands C++?


> Go even replaced a low-level language like C++ and achieved the same result in the end

Did it, though? Honest question. I get the feeling that it ended up competing with Java (and Python) more than it ended up replacing C++. The C++ folks seem to be way more into Rust than Go.


I'm a Java fan and an ex-Scala aficionado. I would have hoped that JVM would eat the pie when it comes to cloud deployments but it didn't happen. Like Scala.js never happened and TypeScript became the type system of the web. JVM languages will remain at the application layer, concerned with microservices, data processing, streaming, databases, etc. It's not what folks seem to be way more into, it's all about tradeoffs. I am talking here about layers and the suitability of a language at a specific layer. I don't know about Rust, but Go proved that it can handle it. If Rust would prove to be better, that would be great, but only time will tell. Until now it did not happen, instead people are trying to fit Rust in scenarios where a higher-level language would go round in circles (GUI programming, microservices, game development etc.). For Java it is too late, if GraalVM with native images and value types would have been released earlier, maybe we could say that Java could compete with Go at that layer, but it is not, the train has left the station long time ago. Only if Oracle comes out of the cave and throw a lot of money at a JVM alternative to Kubernetes, which is likely to happen in the foreseeable future, given the investments and the attention the Java platform has received recently.


Plenty of K8s operators written in Java. Sure, it's not not underpinning K8s itself, but tbh, what language K8s is written in doesn't really matter, so long as it works.


Yes, exactly my thoughts, the problem is that devops generally are used to Python/Go and in my company though they don't make it mandatory, they recommend Go. Also, they have a repulsive reaction to everything .NET and JVM :).


> Also, they have a repulsive reaction to everything .NET and JVM

It's an interesting phenomenon I observe quite commonly.

I think in the devops space they see these VM based languages as basically introducing ten redundant layers of unnecessary assumptions and complexity on top of what is already a good foundation - the Unix OS layer. They know the unix os layer well but every time they deploy one of these VM based languages it creates headaches through unique and strange behaviour that violates their assumptions and knowledge and they have no way to learn it since you need years of experience as a developer to become comfortable with it all. From a developer perspective, we see the OS as this annoying ball of complexity and want VM languages to manage that and make it go away.

So it's all about what you know better and where your comfort zone is in the end. For a long time the developers had the upper hand because the OS story was a train wreck, but containers have turned that around in recent years, so now it is not such a crazy thing to pin down the exact OS and version and entire dependency chain you are going to deploy on.


Fun fact: Python and Go are both managed languages and need some kind of VM.

I guess those people are just not educated enough. Those Ops people don't know much about programming in large usually. A lot of them for example think it's OK to write serous programs in Bash. This says it all, imho.

It's OK when someone looking after admin stuff isn't a full blown programmer. It's a different kind of job after all. But this needs to be taken into account when looking at that mentioned phenomenon.


Fun fact: There are formers devs among Ops, and sometimes vica versa.

Go binaries are statically compiled usually, no VM or managed. It's a simple language for simple solutions, which is often underrated.

Not that versed in Python, but there is CPython.

Bash is totally ok in the hands of someone who uses it for good ;) Agree it has too quirky syntax when you need complexity, so not good for large stuff.

Footguns are everywhere. You usually trade one in for another.


Go binaries still include the Go runtime which handles GCing etc., much like Python.

They're not VMs, but it's really a semantic difference, if you're comparing it to Java, especially now that GraalVM native compilation is gaining popularity.


Sure. Runtime is maybe a better word, though gc and such is not "free".


Yes. The old-school PHP programmers seem to eventually move to Go. Going back to Go from C++ is almost unheard of.


I moved from C++ to Go. It made programming enjoyable to me again. With couple of lines of code, huge and well-thought system libraries and very accessible external modules and gomod I can create a working SW fast. No complicated configuratioms, no cmake, no autoconf, no cryptic compiler errors, no operator overload surprises, no slow template mess, no missing debugging symbols. Of course if you work on C++ rendering engine or you have a huge project already written with all the build and dependency workflow done and it is working fine then Go won't help you or solve anything for you. But I am not going to write anything in C++ ever again (except firmware for microcontrollers / Arduino). It was just too frustrating and too complicated. I can imagine most C++ developers still don't know all language features C++ offer because is simply over-complicated!


Nomad was implemented based on the Borg paper.


Looking forward to Nomad Packs, they where announced with their 1.2 beta[0]. I've really been missing something like k8s-at-home[1] for homelab Nomad setups. Don't know if they will become as versatile as Helm charts since Nomad is less of an abstraction than Kubernetes.

[0] https://www.hashicorp.com/blog/announcing-hashicorp-nomad-1-...

[1] https://github.com/k8s-at-home/charts


We're using Nomad for our backend at Singularity 6 (developing the mmo Palia).

In my experience (having worked with Nomad, k8s, mesos, and homegrown schedulers), k8s is fantastic from an early developer experience, but rough from an operations perspective. It's definitely gotten better, but there's a reason lots of folks want to have someone else host and operate their cluster.

Nomad is the opposite. Less tooling and bit more complexity for the app developer, significantly easier to operate at the cluster level. Nomad is also less "all-in" in it's architecture, which gives operators more flexibility in how they design and support the cluster.

The fact that we can run on anything and don't _have_ to use containers is a serious advantage, particularly for game development.


Nomad gets the job done. What you end up missing is the huge ecosystem of tooling and knowledge that comes "for free" with Kubernetes.


There has been a great announcement today with the beta of nomad 1.2 - nomad-pack: https://github.com/hashicorp/nomad-pack . Which, in my opinion, is aiming to be helm for nomad.

Which is really great, because we've basically built nomad-pack in terraform and we would really like nomad-pack to replace that, because .. it works, but it could be better than terraform wrangling for templating :)


Been running Nomad for a while now at work and home, and it is such a fun project to work and tinker with. Great community, lots of plugins and other great stuff. After running k8s in prod and home Nomad felt like a breath of fresh air in all aspects.


What do you do with Nomad at home? What comprises your hardware? Just curious...


Hardware: A few droplets in DO, a bunch of Orangepi's/Raspberrypi's and 3 HP Elitedesk G1 with some i7 and 32 gigs of ram each.

Software: bunch of solaris/omnios vm's, torrent client, owncloud, mtproto proxies for family and friends, my site, minecraft, quake, openttd, trss, gitea, drone.io, jenkins.

There is other stuff, but it is usually shortlived, i check it out, test it and most often discard it.


I'm also using Nomad at home. It's a single node "cluster" in my desktop Ubuntu-based machine, really using it to deploy some self hosted services more easily (the usual suspects for home automation and the likes).


Nomad seems much simpler to use and manage if you need to do simpler things, but Kubernetes allows you to do more.

We use Kubernetes instead of Nomad at work but we are also using Consul in the Kubernetes cluster.


> Nomad seems much simpler to use and manage

Agree, Nomad is so easy to get started and because of the simplicity of the architecture, very easy to maintain as well.

> but Kubernetes allows you to do more ... We use Kubernetes instead of Nomad at work

Same here, Kubernetes at work, Nomad for personal projects. But I have yet to find anything I cannot do in Nomad that you normally do in Kubernetes. Could you provide some specific things you've tried to get working in Nomad but couldn't (or simply doesn't exists in Nomad), while it worked in Kubernetes? Would be useful to know before I get too deep into Nomad.


> But I have yet to find anything I cannot do in Nomad that you normally do in Kubernetes.

VM live migration. I was surprised that people use kubevirt for that, but apparently this is a valid usecase. Otherwise nomad can do relatively complex vm configurations.


The entire operator paradigm is kubernetes centric. You're missing out on all of the innovations around operations automation if you use nomad. Same with GitOps to an extent, HC pushes people to use terraform or waypoint for deployments while everyone else uses argo and tekton (or some other k8s runner).


Nomad will still get you where you need to be. Aside from vm live migration i am yet to find a good example of a workload where nomad is straight up is unable to do what you need.


You mean you run VMs inside... Kubernetes? Am I misunderstanding something here?


Not only k8s, but i ran VM's with both Nomad and K8s. Nomad supports VM workloads out of the box. K8s requires Kubevirt [0].

[0] https://www.nomadproject.io/docs/drivers/qemu [1] https://kubevirt.io/


I knew about nomad (and it makes sense given it has from the start a more corporate, on-prem audience), didn't know/remember about KubeVirt. Hopefully I won't have any use for it in my scenario but thanks for the link!


What do you even still use VMs for?


Anything that needs a kernel module (OpenVPN with tap for example) is better off as a separate VM. Or anything that is not linux, so BSD, Illumos etc.

Also I am using Nomad over baremetal to create VM's with Nomad clients onboard. Kind of a messy setup, but works really well for me.


Also worth noting when running your own k8s(or k3s) cluster;

> Running Kubernetes is not free. It is expected that it will take about 0.5 to 0.75 of a single core on an embedded system like a Raspberry Pi just as a baseline. [1]

[1] https://github.com/k3s-io/k3s/issues/2278


Ran into this recently with a RPi 3b K3s cluster. Before I could easily allow running workloads on the master node, but since a recent update that node will just get overloaded until unresponsive. Yay for automated hands off updates I guess :).


There seems to be plenty of reasons to run Nomad, compared to Kubernetes, but in what scenarios do Nomad lose out to Kubernetes?

Is it simply a matter of Kubernetes being an open source project and Nomad being owned by HashiCorp?


Kubernetes is much more than container scheduling. Custom resources, identity and a powerful RBAC system allow you to use it as a general configuration/operational data store in your own code, from implementing operators acting upon kubernetes and the outside world to even moving most of high-level configuration glue to be natively based on Kubernetes.

For example, with cert-manager running on Kubernetes you can request a TLS certificate by creating a Certificate resource (like you would any other Kubernetes resource). This is the same regardless of whether you want a self-signed certificate, an ACME-issued certificate (and whether that gets performed via HTTP01 or DNS01 or something else). Oh, and this fully ties into Kubernetes' RBAC system.

In Nomad the closest thing is annotating jobs with traefik-specific tags (and allowing Traefik to do cluster-wide discovery of all tags), but that only works for serving certificates that are managed by traefik, not if your application wants eg. to terminate the TLS connection itself, or if it wants some other PKI hierarchy (eg. a self-signed CA which then issues some other certificates for mutual TLS auth between application services).

Kubernetes also has better support for organization-wide multi-tenant clusters than Nomad seems to have (eg. nomad's policy engine, audit logging and resource quota system are gated behind their “enterprise” offering).


> Kubernetes also has better support for organization-wide multi-tenant clusters than Nomad seems to have

That one's a little weird, I suppose you're right, but all the clients our Kubernetes team works with all want separate clusters or testing, staging and preproduction. They certainly don't want a multi-tenant cluster and share resources with other clients.


> but all the clients our Kubernetes team works with all want separate clusters or testing, staging and preproduction.

And I think that's one of the biggest issues with how people use Kubernetes these days (another candidate being insisting on drive-by deploying a cluster from scratch instead of deferring to cloud providers or a dedicated platform team that can plan for long-term maintenance).

Kubernetes thrives in multi-tenant environments: you get huge resource savings and vastly simplified operations. Everyone in your organization gets access to all clusters, and they can just as easily deploy experimental best effort jobs or their development environment as they can deploy and/or inspect production jobs. Well set up quotas and priority classes mean that production jobs never run out of resources, while less important stuff (batch jobs, CI, someone's pet experiment) can continue to run on a best effort basis, just keeps getting preempted when production wants more resources.

You can even continue to have hardware separation between highly sensitive and fully untrusted jobs by using taints and tolerations, if you feel that's necessary. You still get one control plane instead of five different ones.


> And I think that's one of the biggest issues with how people use Kubernetes these days (another candidate being insisting on drive-by deploying a cluster from scratch instead of deferring to cloud providers or a dedicated platform team that can plan for long-term maintenance).

I don't really understand how you can say this and then...

> Kubernetes thrives in multi-tenant environments: you get huge resource savings and vastly simplified operations. Everyone in your organization gets access to all clusters, and they can just as easily deploy experimental best effort jobs or their development environment as they can deploy and/or inspect production jobs. Well set up quotas and priority classes mean that production jobs never run out of resources, while less important stuff (batch jobs, CI, someone's pet experiment) can continue to run on a best effort basis, just keeps getting preempted when production wants more resources.

... advocate for this. All what you are describing, which is basically what every hardcore k8s user/evangelist will tell you to do, it's reimplementing many, if not all, the features a Cloud provider is already giving you in their own resources. But you are taking the ownership and responsibility for this on your local platform/infra team. What if you screw something with CoreDNS? what if you break some RBAC roles used cluster-wide, while trying a change in the beta environment? I'm pretty sure there are (or will be) specific k8s tools to manage this but still, you are adding complexity and basically running another cloud provider inside a cloud provider for the sake of binpacking. For certain sizes of companies it might be worth the effort, but it is for sure not a silver bullet and probably applies to much less companies that many evangelists try to sell.


> ... advocate for this. All what you are describing, which is basically what every hardcore k8s user/evangelist will tell you to do, it's reimplementing many, if not all, the features a Cloud provider is already giving you in their own resources.

A well-designed KaaS offering from a cloud provider will do that by itself. GKE exposes GCP load balancers as an Ingress controller, IAM identities as Kubernetes RBAC identities, persistent disks as PVs, ... You just get them under a single declarative API.

> But you are taking the ownership and responsibility for this on your local platform/infra team.

With a platform team you're concentrating already existing responsibility into a team that can specialize in operational excellence - vs. that same responsibility being spread out across product teams that have to individually manage their own cloud resources, reinventing the wheel by writing the same terraform/{ansible,puppet,chef,...} boilerplate poorly. My experience is that these per-team bespoke AWS deployments are much more brittle than whatever a dedicated team can provide if given the responsibility and means to do things well.

> What if you screw something with CoreDNS? what if you break some RBAC roles used cluster-wide, while trying a change in the beta environment?

An outage is an outage, you roll back to stop the bleeding, investigate what happened and try to prevent whatever caused it from happening in the future. Neither of these examples are unsolvable in a multi-tenant environment, nor especially more likely to happen than similar screwups when using cloud provider resources.


> A well-designed KaaS offering from a cloud provider will do that by itself. GKE exposes GCP load balancers as an Ingress controller, IAM identities as Kubernetes RBAC identities, persistent disks as PVs, ... You just get them under a single declarative API.

My experience with EKS on AWS tells me that it's not that simple, there are still many things to be glued together. I understand AWS historical position on K8s, and they probably want to keep the k8s experience on AWS good but not awesome but I'm pretty sure that there are even in GCP still serious gaps between "native" GCP features and k8s ones, where you end up reimplementing them on both sides. But I'm no GCP expert so I might be totally wrong.

> With a platform team you're concentrating already existing responsibility into a team that can specialize in operational excellence - vs. that same responsibility being spread out across product teams that have to individually manage their own cloud resources, reinventing the wheel by writing the same terraform/{ansible,puppet,chef,...} boilerplate poorly. My experience is that these per-team bespoke AWS deployments are much more brittle than whatever a dedicated team can provide if given the responsibility and means to do things well.

I'm totally fine with this approach, and we are actually trying to implement it at $DAYJOB but I don't really get why you see thew AWS API as a different monster from the K8s API. With a complex enough system you will need many lines of YAML/charts/Terraform/whatever on the k8s just like CF/Terraform/Pulumi/whatever on AWS. And you can totally have a team that takes care of the quirks and details of AWS while exposing a usable and unified interface for services deployements to the the rest of the engineering organization. I understand if we were talking about bare metal vs Kubernetes (even on-prem), k8s would win hands-down. But in the cloud-native world, I don't really see that day vs night change. Everything has its tradeoffs and its quirks and bugs and corner cases.


With things like custom operators, especially crossplane (but also anything custom you cook up fast) or even custom operator wrapping AWS or GCP templates it's easy for me to offer curated verified solutions across all teams, instead of every one hacking off their own AWS/GCP/WTFcloud scripts to handle things. Even better than directly using cloud provider integration with ingress/service controllers, because I can provide specific limited variants of those APIs. And even without that, I can just use hooks system to blunt corners for the teams.


> You can even continue to have hardware separation between highly sensitive and fully untrusted jobs by using taints and tolerations, if you feel that's necessary. You still get one control plane instead of five different ones.

How much have you had that setup audited? It seems like a lot people aren’t comfortable saying that the internal boundaries are strong enough, which leads to the proliferation of separate clusters.


We tried for multiple years to make Nomad work because it's simple. We're already enterprise customers of Hashicorp, too. We love Hashicorp! Nomad is a great scheduler and the UI is wonderful, but there is no story around network security. Example: You have postgres running on dedicated metal, and you want to restrict which nomad services have access to postgres. Consul Connect allows you to enforce access policies, but these work more like haproxy with automated service discovery. There is no batteries-included way to prevent traffic from leaving your container (or identify which container the traffic is originating from). You can use a custom CNI plugin to give each container an IP address from a unique subnet per service (and restrict traffic at your firewall/ selectively add subnets to iptabels on your postgres server), but now we're adding bespoke complexity. We brought this problem up to the Hashicorp internal teams over many calls, but ultimately they said we'd need to pay for a 3rd party overseas consultant to advise us on implementation. They weren't sure what to do. K8s is certainly more complex out the gate, but you don't feel like you're the first person in the world to implement.

That said, I think Nomad is a few years away from being something truly amazing. If they can solve the network security problem (beyond optimistic documentation), I think it'll be amazing. For now, it's just a good scheduler.


I've been using Nomad for about 3 years and now I've setup a K8s cluster as well.

I love Nomad and so do our devs, but K8s has a much larger ecosystem and more tooling options.

A few examples:

- If you want to run some third party software, there's a helm chart for that. In Nomad, you're likely going to figure things for yourself.

- K8s has a lot of cool stuff like Knative, KEDA and others for more sophisticated deployments.

- K8s has tons of operators to integrate with public clouds and external services. Nomad really lacks something similar.

- There's a ton of knowledge and guides related to K8s online, not so much for Nomad.

- There are many managed K8s solutions, which makes things easier. To this date Hashicorp still does not have managed Nomad.


Nomad is an open source project as well though.


I don't think being OSS has much to do with it; it's more that the dominance of k8s leads to more people learning it, which leads to more tooling being written, which makes k8s better, which leads to more companies adopting it and more people learning it. It's a virtuous cycle much like programming languages experience.


I can't run nomad on aws without managing a nomad cluster myself. I _can_ do that with k8s.


How have you found the managed k8s on AWS? I tend to shy away from their managed services until it has gone through a 4-5 year grace period, interested to know what your experience has been.


The short answer is it's less hassle than the local cluster I run locally for development purposes, to the point that were considering spinning up eks clusters for development use rather than running kind or k3s locally.


Nomad vs kubernetes is like comparing apples with pears. They actually say it in their docs, that kubernetes is a complex solution whereas nomad isn't. For example when building a frontend, you don't go with react or building everything yourself, you use a framework that brings plenty of things at once that play well in an eco system. What's the matter of bringing a ton of hashicorp tools together manually? You can just use kubernetes. I'd say setting up kubernetes is just as complex as putting together a bunch of hashicorp solutions. And ultimately people (the majority) simply love all-in-one solutions


> There seems to be plenty of reasons to run Nomad, compared to Kubernetes, but in what scenarios do Nomad lose out to Kubernetes?

It would also be interesting if Docker's Swarm was.also featured in this comparison, as it just works right out of the box and doesn't come with any bells and whistles.


As far as I remember from trying last time a few years ago, Docker Swarm does not support rebalancing the load of containers and requires either manually issuing a command to do that or merely initially balances where containers are started. It this still true? And what about when you have state, which you would need to move between nodes, to make a container work elsewhere?


Swarm is great for small workloads (hosts =< 10). Networking is slow, but if you need something to just run containers over a set of hosts, you won't find anything simpler than that. Sadly Mirantis (company who now owns docker.inc) is intentionally neglecting it.


Does it support multiple nodes?


> Does it support multiple nodes?

Yes it does. Docker Swarm essentially provides an easy way to get a cluster of Docker instances running in multiple nodes that works right out of the box.

https://docs.docker.com/engine/swarm/


I've run into so many issues with Nomad that it really doesn't make sense to compare the two. Many are well documented in Github issues and occur when utilizing other Hashicorp products.


I'm curious how responsive you have found them to issues you've identified.

Any particular problems you've experienced that have been long standing an unresolved?

I occasionally like to kick the tires on Nomad, but so far haven't found it compelling enough to switch away from my current solution, GKE (mostly because of how integrated it is with everything else I am doing and, at this point, familiarity).

So, curious to hear from others where the rough edges are.


Not the person you’re replying to, but I’m a couple of years in on Nomad, coming from kubernetes.

I’ve reported and participated in a few issues. Most got fixed within the next couple of minor versions, some went neglected.

I think the somewhat recent 1.0-release was pretty well-timed. There are still a few rough edges (though it’s gotten a lot better continuously over than last year). Specifically, Consul Connect and federation (especially together with ACLs and mTLS) are just about coming together now.

If you don’t rely on Connect and federation across regions, I’d say it’s reliable for production use. Other than that maybe wait 6-12 months and reassess.

I’d also advise to put ACLs and TLS in place before you start putting production loads there; it’s not great to introduce in a running cluster.


As much as i think k8s is the most complex option that exists, which is bad, its being pushed by the big boys, which means tools exist for it.

I can get so much value by plugging in other tools to this generic platform. Feature flags? just plug it in. On cluster builds? Just plug it in. Monitoring? Plug it in.


For my solo developer self-hosted PaaS I really like CapRover [1]. Nice GUI, scalable Docker Swarm , with integrated route management, load balancing and one-click SSL. With available CLI you can use Gitlab CI to directly build and docker images and leverage CcapRover API to directly deploy on a dev stage.

Interessting discussion about CapRover vs. Flynn, Dokku, Nomad on HN one year ago. [2]

[1] https://caprover.com/ [2] https://news.ycombinator.com/item?id=23465087


Is it so that the fragmentation (the long list of distributions), or variety of configuration options, is a bad quality? This seems to be used as a counter argument to Kubernetes.

Probably, it could be end result of unconsistent design or bad technical choises. However, most likely it just means that there are multiple organizations and interest groups pushing changes and ideas to the project. This should be seen as a good thing. The downside is that there is no single source of best practices and this is confusing to newcomers. You just need to pick one distribution and trust the choises, or develop the competence and understanding.

And we could imagine that the userbase or the number of developers in single distribution (take OpenShift or Rancher) could be bigger that in Nomad itself.

Having said that, I still would like to see more stable Kubernetes landscape, and that has to happen eventually. The light distributions k3s and k0s are pushing things to nice direction.

OpenStack had similar, or maybe even worse, fragmentation and complexity issue when the hype was high. There were probably technically better alternatives (Eucalyptus?), but people (and companies) gathered around OpenStack and it won the round. However, comparing OpenStack to Kubernetes feels bad, as Kubernetes is technically far superior.


If you have a mixed Windows/Linux environment I cannot suggest Nomad strongly enough. It truly shines in this environment. Especially if you're dealing with legacy stuff.

Even better if you're running Mac + BSD as well, it just runs everywhere.


i would say the "simplest" setup for us has been ECS fargate (even thought there's some major lock in). Very easy to spin up a cluster and you don't have to manage the underlying structure. If you use docker-compose or equivalent for local dev you just define your tasks/services and launch. Even pulling in ENVs is easy with SSM.


What kind of lock-in? One of the reasons I started using it was due to the minimal lock-in, so we've arrived at different conclusions.

A task definition is required, which is sort of a docker-compose.yml analogue, but it's also possible to use docker compose directly with ECS Fargate. So my 'get out of AWS' is pretty simple, I already have the docker-compose.yml ready.


i suppose what i mean is you'd never be able to take a ECS task definition/service and move it over to some other provider. whereas if u did some kubernetes setup u could conceivably do that, moving form AWS to GCP.


I assume that Nomad is simpler since the other features are provided by other hashi corp tools (e.g. secrets, auth, etc).


Indeed, the focus is different for the two tools, something that is outlined with more words in the opening paragraph of this very submission:

> Kubernetes aims to provide all the features needed to run Linux container-based applications including cluster management, scheduling, service discovery, monitoring, secrets management and more.

> Nomad only aims to focus on cluster management and scheduling and is designed with the Unix philosophy of having a small scope while composing with tools like Consul for service discovery/service mesh and Vault for secret management.


In my experience you end up needing additional tools for everything in Kubernetes such as secrets, auth, etc. that’s the pluggable API selling point.


Might. But most of them are open source with healthy competition. With Nomad, the ecosystem is much smaller(basically one company), and can always be closed under a paywall (e.g. terraform enterprise)


Do you mean Nomad Enterprise? Which actually exist https://www.nomadproject.io/docs/enterprise :)


> In contrast to Kubernetes' fragmented distributions

This article just seems like "All of Nomad's best features phrased kindly vs all of K8s' worst features phrased critically".


> Kubernetes is an orchestration system for containers

Yes, but No. Kubernetes is a portability platform, that happen to -also- orchestrate containers.

Using kubernetes means you have complete reproducibility of the network setup, the deployment and the operation of any workload -no matter how complex-, on any kubernenetes cluster and cloud provider.

Nomad, is -well- just a glorified Airflow.


Nomad, just like Unix, prefers composition of simple tools in favor of one-big-tool-for-everything that Kubernetes is going for. So for achieving those things, you'd use Terraform or something similar, and then you have a reproducible environment for the hardware/software setup outside of Nomad.

> Yes, but No. Kubernetes is a portability platform, that happen to -also- orchestrate containers.

The homepage of Kubernetes seems to disagree with you. Their headline there is "Kubernetes, also known as K8s, is an open-source system for automating deployment, scaling, and management of containerized applications." and also "Production-Grade Container Orchestration" so it feels safe to assume that Kubernetes is for orchestrating containers.

> Nomad, is -well- just a glorified Airflow.

I never used Airflow, but looking at the website it seems to be geared towards automating workflows, something like Zapier but self-hosted and open source? That's very different from what Nomad is.


> The homepage of Kubernetes seems to disagree with you.

People have trouble understanding what k8s is, and what to use it for. That's fine, it'll take a while, but they will eventually understand what "Production-Grade Container Orchestration" really means when they start working with it.


Can you explain this? Having read it I have no idea what you think the differences are — it just sounds smug.


The first thing you start with on k8 is a Deployment. This will cover Container Scheduling, Replication, Orchestration on available Nodes, Secrets and volume bindings.

By just following this one tutorial (https://kubernetes.io/docs/tutorials/kubernetes-basics/deplo...), you already cover everything you used to do using docker-compose.

Now what ? You are going to learn about Services (https://kubernetes.io/docs/tutorials/kubernetes-basics/expos...), Ingresses (https://kubernetes.io/docs/concepts/services-networking/ingr...), Ingress-Controllers (https://kubernetes.io/docs/concepts/services-networking/ingr...), Persistent Volumes (https://kubernetes.io/docs/concepts/storage/persistent-volum...), Configuration, Pod Lifecycle, RBAC, Rolling Upgrades, Operator Pattern, ...

This is not about orchestrating containers anymore, it's a mix of network, configuration, storage APIs that just reunite everything you used to do with shell scripts under a fully declarative format. Then you realize, the _ACTUAL_ value of kubernetes isn't about the containers it can start-up, it's about being able to _MOVE_ those containers, their HTTP routing rules, their database, their backup schedule, their secrets, their configuration and everything else on totally different machines with different OS and different topology, just by running a kubectl apply.


> you'd use Terraform or something similar

Wouldn't you also use terraform for your kubernetes cluster?


Kubernetes is the new POSIX. It is complex for sure. But yes, portability is what matters. No vendor lock-in as long as you've abstracted your workloads to Kubernetes.


That sounds like saying you've avoided vendor lock-in by using Linux — not entirely wrong but definitely leaving out a lot of the trade-offs. Since your application does real work, you'll be locking in to different components to varying degrees and you really should be reasoning about that in terms of the benefits you see from using something versus the costs (either direct or in ops / support).

For example, if your application uses a database the major lock-in concern is the flavor. If you're using Kubernetes to deploy MySQL or Postgres or paying e.g. Amazon/Google to provide it for you, there's relatively low cost of switching because it's a very tested, standard interface with well-defined semantics. On the other hand, if you're using something like AWS DynamoDB or GCP Datastore you probably do want to think carefully about an exit strategy because you'd be baking in assumptions about how those services work in a way which is non-trivial to replicate outside.

The important thing is remembering that this is a business decision, not a holy cause, and the right answers vary from project to project. For example, in a larger organization you might find that it's worth using the platform services not because they're cheaper or better tested than what you can build for yourself but simply because it's easier to be able to check various auditing boxes using the standard auditing tools than having to laboriously demonstrate that your implementation meets the same requirements.


> The important thing is remembering that this is a business decision, not a holy cause, and the right answers vary from project to project.

I must say - that's exactly about it. No holy cause, technical decisions should not be made in vacuum and there's a lot more to it, always and if not, should be.

Having said that, mine is a qualified statement. If you have abstracted your workloads to the Kubernetes abstractions alone, you're good to go anywhere from Linode to GCP and anywhere in between.


> Nomad, is -well- just a glorified Airflow.

This doesn't make sense. While in the end these tools might both run code in containers they serve a different purpose. Airflow is far more aware of ETL concepts and comes with a lot of batteries included for those use cases. Whereas Nomad is more a generic solution with more emphasis on infrastructure.


Nomad could use an official or semi-official distribution. Something that you could throw into any VM with minimal configuration and it would create a new or join an existing nomad cluster.

I've been thinking about building such a thing on Arch (btw) but haven't acquired enough time-energy to do it.


Nomad + Consul does auto-clustering. All you have to do is specify in which mode your nomad binary is working and where your consul client is listening. For local development -dev mode exists for both Consul and Nomad.


Yeah, I figure this shouldn't be too difficult to implement, but will be at least laborious to consistently keep up-to-date and stable.


What you are seeking is called NixOS w/consul configured.


I don't agree with "Kubernetes is an orchestration system for containers originally designed by Google" statement. White it is not false, it creates a wrong impression of what K8s is.

Kubernetes is a cloud operating system which lets you run modern apps in any environment. One important component of course container orchestration, but it went far beyond just orchestrator. Kubernetes has a very powerful ecosystem, and it managed to unite almost all infrastructure vendors and projects. Its amazing to see how so many competing companies could agree on something. And that become K8s.

Nomad is great when you're working with VMs, but I don't see it is much relevant in the modern era of K8s and cloud-native.


What is nice about using the same tech as the 'big players' is you get the benefits of massive armies of engineers building integrated products and features to sell to other big users of the product. This means there are options to add on distributed databases, storage, monitoring, tracing, CD/CI, etc. So it can be worth swallowing 'big player' complexity if it means you can play in the same ecosystem. If you are already on a cloud then your infrastructure provider will handle a lot of that complexity for you anyway and it's often better integrated into their other products (i.e networking, access control, etc).


So far I've kept things simple, avoided k8s/HashiStack/etc by using docker compose with a simple docker-compose.yml for each server. This has been working well, but I'm starting to feel the pain points - HA requires copy-pasting yaml, I need to specify which services are on each server, and so on.

What's the simplest next step I can take? I'd like something with (close to) the simplicity of docker compose, but ideally being able to specify something like `instances: 2` and being able to route/load-balance to those instances from another service.


Assuming serverless is out if the question for your use case, have you tried spending a couple of days investigating a managed Kubernetes cluster with node autoscaling enabled? EKS, AKS, GKE...

Honestly it sounds like you could be at the point where K8s is worthwhile.


I'm considering k8s, but that also means moving services from on-prem to AKS, getting INF to open up the necessary firewall rules to make the services reachable from on-prem, and so on. And as you said, it's definitely days of investigation. I'm not closed to the option.


You might also want to consider one of the simpler pre-packaged solutions for small k8s on premise clusters (like k3s)


You should be ok with consul, ansible and templating your docker-compose files. Might take some time to set it all up, but should be ok.


Thanks for the suggestion.

Any suggested starting points for my research? DDG search for "ansible docker-compose" brings up some suggestions like [1] and [2] but I'm curious if you have other suggestions.

And just so I understand how these work together - I'd use Jenkins+ansible to push containers to my servers, I'd run consul within docker as well, and ... would Ansible register services with consul as it pushes them? Do the services need to be modified to speak with consul directly?

[1] https://docs.ansible.com/ansible/latest/collections/communit... [2] https://www.ansible.com/blog/six-ways-ansible-makes-docker-c...


Missed the whole consul part of the question. To register services you can: 1. You can use ansible module [0] 2. Use a template to create json files or suing http API [1] 3. Registrator. Project is old but still works well. [2]

> Do the services need to be modified to speak with consul directly?.

I am not sure that i get you, but had no such need when migrating legacy stuff.

[0] https://docs.ansible.com/ansible/latest/collections/communit... [1] https://learn.hashicorp.com/tutorials/consul/get-started-ser... [2] https://github.com/gliderlabs/registrator


you can check out ansible templating engine that ansible uses - jinja2 [1]. And the templating module itself[2]. But if you are not well versed with ansible check out Jeff Geerling's "Ansible 101" [3].

[1] https://jinja.palletsprojects.com/en/3.0.x/

[2] https://docs.ansible.com/ansible/2.9/modules/template_module...

[3] https://youtu.be/goclfp6a2IQ


My personal win for Nomad is its ability to run non containerized apps.


Does anyone know if Nomad can run on ChromeOS? I’ve had issues open with the Chrome and K8 teams to fix an error preventing it from working for months but I don’t think a fix will ever see the light of day, as it just gets passed around various repos.


Title should be "Nomad vs Kubernetes from Nomad's point of view".


The domain you see here on HN is nomadproject.io and the blog doesn't hide who the blogpost is from, it's very apparent already it's from the perspective of the people that maintain Nomad.


wonder what the 'orchestration feature set' of 2 years in the future will be

speaking just for myself, feels like I want a system that runs a mix of serverless (like knative), normal containers, raw linux binaries (easy w/ fat binary systems like golang), language-specific packages like heroku does, maybe firecracker VMs.

hard to get knative right if you don't have a reverse proxy tightly integrated with your tool


Wild guess, probably more than 2 years ahead (also kinda what I'm hoping for)

A natively distributed runtime providing similar capabilities to Docker + Nomad but running on "bare" WASM, providing a compilation target, distributed and clustering primitives right in your application code.

Imagine the nodejs cluster module, targeting nomad directly, or talking to an instance of a service running on another node, the communication and addressing would be handled by the runtime.

Similarly, cluster or instance level ports could be opened directly from your code and the distributed runtime will do the external/internal load balancing. Maybe taking it one step further and advertising services directly abstracting ports away.

Service mesh + Scheduling + Capability-based Runtime in one binary you could run on Linux. One can dream.


Any nice step by step tutorial how to get nomad vault and consul architecture running quickly? I’m especially interested in non container job.


I was into it until I saw there is an enterprise version link. If it's not FLOSS I'm out.


It's mostly OSS. There are pay to play enterprise features, but you can most definitely run high end production clusters without needing enterprise.


Do any of these systems come with a container image registry setup?


Why has this post dropped 20- odd ranks in roughly an hour?


Oh God another one


Another one?

Nomad and kubernetes were released within months of each other in 2015. This isn't new.


Wow, there's so much technical detail here! Like the part about how it's "simpler" and "flexible" and "consistent"

I'm totally convinced by this product's marketing page!!


I know you're being sarcastic. But the marketing page is not wrong. Try nomad yourself.

Install:

$ brew tap hashicorp/tap

$ brew install hashicorp/tap/nomad

Run:

$ nomad agent -dev

Start a job:

$ nomad job run example.nomad


These kinds of posts, by the vendor comparing to a competitor, always leave such a bad taste in my mouth. They decrease my confidence in both the product and the vendor. Stand on your own merits.

There's a saying in Dutch: "Wij van Wc-eend adviseren Wc-eend"[1]. It basically boils down to pretending to give advice or information but you're just promoting your own interests.

[1]: https://untranslatable.co/p/amarens/wij-van-wc-eend


The Hashicorp ones tend to be well written and informative. I have found them useful in the past, even though I usually loathe sales BS. In this case the bias is a bit more obvious but we know it's written by Hashicorp and can make allowances.


I know what mean, but it's not like they're pretending that much if it's an article hosted under nomadproject.io. The first question everyone is going to ask "why should I use this instead of K8s" so you might as well have a good answer.

On the other hand, the Rust project has purposefully avoided "Rust vs X" comparisons on its website. I can't find the HN comments to back this up, but people like steveklabnik have indicated that they don't find that sort of adversarial comparison to be useful. Rust has done an excellent job at community building, so I give a lot of credence to their approach.


Yes. This topic is long and complicated; maybe I'll write it up or give a talk or something someday.

There is a lot of nuance.


I'd love to read a blog post on that. I completely see how it's a difficult line to walk - you don't want to be "Rust is better than all these rubbish languages" but you still want to provide people with information so that they can make informed choice.


Yup exactly. Comparisons are important, but you also really want to avoid building an “us vs them” culture.


Well, Rust is currently the best programming language so it's the other languages that need to come up with a "X vs Rust".


The standard refrain in companies is "don't talk about your competitors, talk about yourself". And that's fine, unless nobody wants to buy your product because your competitor is all anyone talks about. At a certain point you do need to tell people why you're better than the competition.

But actually, they're not competitors. Hashicorp supports K8s clusters for their customers. Nomad is just a product they built that is an alternative to K8s, and plugs into their whole product suite. Someone has probably asked them many times, "Why should I use Nomad instead of K8s?" So they have a page about it.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: