Getting Kubernetes up and running isn't really the issue anymore, that's pretty easy to do. The tricky part is long term maintenance and storage.
I'm not really sure what people expect gain from these kinds of article, they're great as notes, but it's not something I'd use as a starting point for installing a production Kubernetes cluster.
The initial setup of a Kubernetes cluster is something most HN readers could do in half a day or so. Learning to manage a cluster, that's tricky. Even if you resort to tools like Rancher or similar, you're still in deep waters.
Also why would people assume that there's any difference in installing Kubernetes on an operating system running on physical hardware vs. on virtual machines?
> Getting Kubernetes up and running isn't really the issue anymore, that's pretty easy to do. The tricky part is long term maintenance and storage.
This times 100. Deploying basic clusters is easy. Keeping a test/dev-cluster running for a while? Sure. Keeping production clusters running (TLS cert TTLs expiring, anyone?), upgrading to new K8s versions, proper monitoring (the whole stack, not just your app or the control-plane), provisioning (local) storage,... is where difficulties lie.
I’m working on this right now. My theory is that having every cluster object defined in git (but with clever use of third party helm charts to reduce maintenance burden) is the way to go.
Our cluster configuration is public[1] and I’m almost done with a blog post going over all the different choices you can make wrt the surrounding monitoring/etc infrastructure on a Kubernetes cluster.
My comment above did not come out of the blue, but based on real-world experience ;-) You may be interested in our MetalK8s project [1] which seems related to yours.
first of all with k3s keeping a production cluster running is still pretty easy.
second you should always be ready to start from scratch, which is also pretty simple, because of terraform.
a lot of people are scared of k8s but they did not even try. they prefer to maintain their scary ansible/puppet whatever script that works only half as good as k8s.
> first of all with k3s keeping a production cluster running is still pretty easy.
Fair enough. I'll admit I have no direct experience with K3s. There are, however, many K8s deployment systems out there which I would not consider 'production-ready' at all even though they're marketed that way.
> second you should always be ready to start from scratch, which is also pretty simple, because of terraform.
That may all be possible if your environment can be spawned using Terraform (e.g., cloud/VMWare environments and similar). If your deployment targets physical servers in enterprise datacenters where you don't even fully own the OS layer, Terraform won't bring much.
> a lot of people are scared of k8s but they did not even try. they prefer to maintain their scary ansible/puppet whatever script that works only half as good as k8s.
We've been deploying and running K8s as part of our on-premises storage product offering since 2018, so 'scared' and 'didn't try' seems not applicable to my experience. Yes, our solution (MetalK8s, it's open source, PTAL) uses a tech 'half as good' as K8s (SaltStack, not Ansible or Puppet) because you need something to deploy/lifecycle said cluster. Once the basic K8s cluster is up, we run as much as possible 'inside' K8s. But IMO K8s is only a partial replacement for technologies like SaltStack and Ansible, i.e., in environments where you can somehow 'get' a (managed) K8s cluster out of thin air.
I've been using this terraform provider quite a lot lately. It has made it a cinch to templatize a full manifest and pass data to the template for a deploy. We now have a fully reproducible base EKS cluster deploy done with working cert-manager/letsencrypt, nginx ingress, weave-net, fluentd logging to elasticsearch service, etc. Our application-specific code lives in a different repo and deploys things using YTT. It's so much more elegant than our old method of copying and pasting manifests and crossing our fingers and hoping the cluster didn't fall down. A full migration to a new cluster and deploy of a whole new application stack takes under an hour now.
This is where Uffizzi is going. Phase 1 they started with their own custom controller and they are managing clusters for customers who pay for a portion of that cluster. In Phase 2 they are opening up their control plane to enterprise customers to solve the “times 100” management issue listed above.
I was one of those backwards people who opposed containers, until late 2019.
My start was the official kubernetes docs, step by step. Try everything, write it down, make ansible playbooks. Even used some of their interactive training modules, but I quickly had my own cluster up in vagrant so didn't really need the online shell.
Now we have two clusters at work, I have ansible playbooks I'm really happy with that help me manage both our on-prem clusters and my managed LKE with the same playbooks.
They are focused only on CentOS 7, you might be able to find them yourself. As far as I can tell they include everything except persistent storage and HA control plane.
Just use Kubespray, not much sense in wasting time writing your own unless your intention is specifically learning how to manage cluster setup with stateless Ansible
I actually looked at kubespray first but first of all it did not work out of box as promised. And to troubleshoot it you basically had to know about kubernetes already.
So it made a lot of sense to start at the official docs. And while I'm reading the docs, why not build the cluster at the same time. And while I'm building the cluster, why not write each step down in Ansible so I won't have to repeat myself.
So end result is my own ansible setup that I'm happy with and I know inside and out.
It always bothered me to run other people's ansible playbooks. I'm too much of a control freak.
When people talk about “bare metal” k8s they mean on-prem without the support of integrated cloud infrastructure. All the external stuff like load balancers, ingress controllers, routing, bgp, public ip pools, shared storage, vm creation for scaling, are all things the cluster can’t do for itself and have to be implemented and integrated.
I'm still looking for a good resource as someone who's taken their eye off the ball for a decade or so. Things have (and continue to) move at an astounding pace...and I'm needing something that goes from 'Pets and autoconf' to where we are now.
It's still using the same old stack, just containerized. For example, you're no longer manually editing nginx conf, but instead writing the config as yaml annotation to tell the nginx ingress what to do. Instead of editing wp-config.php when deploying wordpress, you specify the db config as environmental variables in yaml file, etc.
Can you explain why one would use Kubernetes? I know containerization but still haven’t figured out wtf Kubernetes is. Everyone and their mother keeps telling me it’s an orchestration tool for containers. It’s as helpful as saying a Pecan pie is an orchestration of flour, sugar and pecans.
Orchestration systems like Kubernetes, Amazon ECS, and (formerly, it’s pretty dead) Mesos run on a cluster of systems and coordinate over the network. What they’re coordinating is running your containerized services, for example your API server, micro services, etc. So you can tell the orchestration service “run 5 copies of this and listen externally on port 9000. When a request comes in on external_ip:9000, redirect it to one of the 5 service instances.” They will also restart your services if they die, keep stdout/stderr logs, and so forth. You can have very complex setups but most will never need more than what I’ve just described.
The main difference between the different orch. tools is how they pipe the data between Outside and the service cluster. Mesos didn’t help you, you had to build your own service discovery. ECS uses Amazon load balancers and some custom black-box daemons. Kubernetes famously uses a complex mesh of “ingress controllers” and kernel iptables/ipfw spoofing. All take their configurations in some form of JSON or YAML.
Is there some sort of a PID-feedback-loop control that monitors the CPU/memory load and helps spin up more instances if it sees more traffic? If Kubernetes doesn't do that, what piece of software can help automatically scale if there is a huge traffic spike?
Load balance AFAIK doesn't do that. It just helps distribute the load.
Using a HorizontalPodAutoscaler [1] you can scale up and down the amount of "pods" (a grouping of containers that are scheduled as a unit) based on the desired metric.
Kubernetes is a multi-physical-machine operating system. In a typical OS you have CPU, RAM, etc. and the kernel decides how much of each to allocate to each running task. Kubernetes works the same way, but across multiple computers by default- and it can additionally manage any other arbitrary resource through the use of CustomResourceDefinitions (IPs, higher level concepts like “Minecraft Servers”, etc.)
If you want a (better) cooking analogy - if a container is a 'station' or a line cook in a professional kitchen, then Kubernetes is 'running the line', managing the kitchen, making sure everyone has the tools and ingredients they need.
"Orchestration" means it will manage how many of each container you have running, configuring, starting, and stopping them and possibly migrating them from one machine to another.
if you know systemd, one of its many functions is a single stop systemd controller for all of your infrastructure. Then it also gives you a single way to control storage, monitoring, load balancing and deployment from a single API.
It's neat. It is a lot of moving parts tho. I am just now trying it in a big infrastructure because we have so many bespoke parts that we have to glue together that we might as well try to use what is standard now...
Or more accurately, you now have to understand both the nginx.conf file format, and the right way to nudge the YAML so it produces what you want. Plus some intermediate wrapper script that sets things up on container start that does more funky things based on environment variables.
You don't directly. You add some extra lines of yaml to an k8s ingress resource, nginx detects this and updates itself. A single nginx container can service many ingresses for each of your apps. The idea is to distribute the config to the app manifests they are related to
> Getting Kubernetes up and running isn't really the issue anymore, that's pretty easy to do.
Care to share an "easy" recipe, because I haven't found one that actually works. It always falls apart for me with the networking.
Let's say I have a cluster of 8 physical nodes and a management node available in my data center and I want to set up k8s for use by my internal users. I'm a solo admin with responsibilty for ~250 physical servers so any ongoing management necessary will be very much a task among many.
Is this blog post a good guide? Is there a better one?
An 'easy' way to deploy a cluster could be using kubeadm. Then you'll need a CNI like Calico to get Pod networking up-and-running. However, you'll want to install a bunch of other software on said cluster to monitor it, manage logs,...
Given you're running on physical infrastructure, MetalK8s [1] could be of interest (full disclosure: I'm one of the leads of said project, which is fully open-source and used as part of our commercial enterprise storage products)
Using the 2017 instructions went about like you'd expect in this space, with everything moving as fast as it does.
Using the instructions here has it complaining "Terraform initialized in an empty directory!"
mucking about and thinking the main.tfvars file was a typo and it wanted main.tf got things a little further along then generated another error because it WAS meant to be main.tfvars...
And at this point I'm frustrated, confused, and no closer to understanding the concepts...
I don't entirely understand what's happening here. The title and the post talk a lot about "Bare-Metal", but it also seems to indicate that everything is hosted on servers running Ubuntu. Which, if accurate, seems to be Kubernetes running on Linux, not bare-metal?
How much of linux isn't about cgroups and namespaces? Docker I believe needs about a 100 system calls to get containers to work. How much of the tree could you shake and still have containers work? Would you still call the host Linux, or something else? And what would that system do for Windows and OS X users? Anything?
Could you maintain this 'something else' as a permanent fork, a la Red Hat?
Point here is that term bare-metal cannot apply to applications relying on OS features. Unikernels are the closest thing you can use this term in this area.
Wikipedia [0] doesn't agree with you (neither do I, but who am I anyway?):
"A bare-metal server is a computer server that hosts one tenant, or consumer, only.[1] The term is used for distinguishing between servers that can host multiple tenants and which utilize virtualisation and cloud hosting.[2] Such servers are used by a single consumer and are not shared between consumers. Each server may run any amount of work for a user, or have multiple simultaneous users, but they are dedicated entirely to the entity who is renting them. Unlike servers in a data centre, they are not being shared between multiple customers.
Bare-metal servers are physical servers. Each server offered for rental is a distinct physical piece of hardware that is a functional server on its own. They are not virtual servers running in multiple pieces of shared hardware."
> In computer science, bare machine (or bare metal) refers to a computer executing instructions directly on logic hardware without an intervening operating system.
Bare-metal server has indeed been used to refer to non-virtualized servers, but it's a misnomer. The current terminology for this is "dedicated server".
I've never heard of that before. Besides, dedicated server is an unfortunate term in practice either; I may pay Hetzner for a bunch of dedicated servers, some run a bunch of VMs, some run Kafka without a virtualization layer. My services in containers on Kubernetes in VMs on dedicated servers are still running on dedicated servers, but definitely not what I'd call bare metal. In my experience, calling the Kafka server "bare metal" is understood by everyone, used by pretty much everyone I've come across and doesn't get mixed up with adjacent concepts so easily. Maybe that's different in niche/research circles where "bare machine" is a concept that's actually relevant in day-to-day work, I wouldn't know. As far as I'm aware, the term is very much alive and kicking.
A misnomer repeated a thousand times by a thousand different people becomes a/the new meaning. It's how human languages work and evolve or change over time.
Today when most tech people talk about bare metal they refer to a server that is not virtual.
It is still insignificant compared to the original terminology, which is heavily used in hardware/embedded to distinguish between the use of an OS (embedded RTOS, Linux, etc.) vs. direct programming.
It has to start with something. It would be a lot weirder if the first revision had been a white paper.
If you disagree with what's on the page, please add or edit the information there. With proper references, it will be appreciated by everyone that uses Wikipedia.
Even Apple named their API Metal in reference to existing phrases like "running close to the metal" back in 2014 and even then it raised some eyebrows in my opinion, but that usage is still miles better than basically saying you're running a program normally. Since when absence of virtualization requires articles to be written about?
If you mean something like runV which runs Linux kernel in a lightweight VM still then how does that move the needle in the bare-metal direction exactly?
It appears there's two camps in the audience. The camp I'm familiar with (and it appears the author as well) use bare-metal to mean running on your own computers (either in-house or colo) instead of in a VM on a cloud provider.
Everyone is right. Each group have their own background or silo or bubble or industry and each can use the word. Nobody can claim exclusive ownership unless they have a trademark. The one that still boggles my little mind is "Apple". How a company can own exclusive rights to that I will never understand, regardless of how many times a lawyer explains it to me.
Traditionally “bare-metal” would refer to running directly on physical hardware with no operating system. In the context of cloud providers, “metal” does now seem to mean “doesn’t run in a VM” but to varying extents. Seems like an unnecessary overload.
In the dark ages, when running an OS in the way you're referring to 'bare metal' was the default, it meant running something without a traditional kernel underneath it.
Not many things are written to do that, of course. Oracle used to offer an installation mode like this. It was generally a gimmick - you pay for a tiny bit of performance with a ton of flexibility. There are probably use cases where it makes sense, but not that many.
Yeah, but it was a super niche term back then. After 2010, at least, bare metal was used a lot more for: "running the OS we need (+ app) directly on a physical server".
This usage has been, in my experience, a lot more widespread.
> Oracle used to offer an installation mode like this
Oracle, and BEA before them, used to offer a JVM which ran on top of a thin custom OS designed only to host the JVM, you could call it a "unikernel". Product was called JRockit Virtual Edition (JRVE), WebLogic Server Virtual Edition (WLS-VE, when used to run WebLogic), earlier BEA called it LiquidVM. The internal name for that thin custom OS was in fact "Bare Metal". Similar in concept to https://github.com/cloudius-systems/osv but completely different implementation
I think one thing which caused a problem for it, is a lot of customers want to deploy various management tools to their VMs (security auditing software, performance monitoring software, etc) and when your VM runs a custom OS that becomes very difficult or impossible. So adopting this product could lead to the pain of having to ask for exceptions to policies requiring those tools and then defending the decision to adopt it against those who use those policies to argue against it. I think this is part of why the product was discontinued.
Nowadays, Oracle offers "bare metal servers" [1] – which are just hypervisor-less servers, same as other cloud vendors do. Or similarly, "Oracle Database Appliance Bare Metal System" [2] – which just means not installing a hypervisor on your Oracle Database Appliance.
So Oracle seems to have a history of using the phrase "bare metal" in both the senses being discussed here.
Hmm - what's the overlap between your definition of "bare metal" and the current definition of "embedded"?
I will say, this comment section is the first time I'm hearing about "bare-metal" meaning "without an OS", but the above question is genuine curiosity.
Those terms are orthogonal. Embedded typically refers to running on some HW that is not typically thought of as a computer. Embedded SW can run within an OS or on bare metal.
Directly on the hardware. It's the only definition of the term that I'm familiar with. Like the difference between writing your application and managing lifecycle and peripheral access all yourself directly to your MCU vs. using an RTOS to provide you facilities for task scheduling and I/O primitives, etc.
I've never before encountered, "Includes a full feature-rich OS" as "bare-metal" before. Reading the title I assumed someone managed to get some flavor of Kubernetes running right on the hardware as the lowest-level software layer of the system. That would have meant bare-metal to me. What's described here is running Kubernetes on a physical host rather than a virtual host from what I can tell, but it's not running Kubernetes "bare-metal" because between Kubernetes and the "metal" is Linux.
Or at least that's what it would mean in my world, but the interpretation appears to be different for others. Outside of confusion, I was also just disappointed. This article is just basically setting up Kubernetes. That it's on a physical host is a lot less interesting and novel to me than if they'd managed to implement some shape of Kubernetes as the OS itself, which is what I'd originally interpreted the title to mean.
Im surprised you've been downvoted as much as you have. It appears bare metal has a very specific meaning in certain contexts. For what its worth, it also has a very specific meaning in cloud provider contexts, which is exactly what you've defined here.
to me, running on "bare metal" means part of your program is setting up the clock for various buses and CPU and you have another little program who's job is to jump to the first address of your real program.
> In computer science, bare machine (or bare metal) refers to a computer executing instructions directly on logic hardware without an intervening operating system.
Running Kubernetes on Ubuntu is a bad idea especially if it is multitenant. You can have a lot of security issues there as users can run anything on those nodes.
This exercise is far from production
It's not "Newspeak". Kubernetes is an application, it runs on an OS. You can call it "Newspeak" if you consider anything after VMs became super popular "Newspeak" (around 2010, I think?).
Bare-metal = no VMs or other virtualization involved.
It is somewhat "newspeak", as bare metal has meant "without an OS" in the embedded space for a very long time. This is just a case of two different spaces using the same term for different ideas.
Bare metal has meant "whithout a supervisor" to operations for more than a decade. It has also meant "without an emulator" to the emulation community for a really long time.
I imagine it also has some meaning for the music community.
I suspect it comes from the automotive paint industry. Sanding down to the "bare metal" for the best finish...where the primer and paint are as close to the substrate as they can be.
I did couch it with "somewhat". Though it does fit the theme of a diminished meaning for the same word, since the newer meaning is a much simpler thing to do.
I actually just got a NUC with 64gb ram. Alpine installed to encrypted NVMe zfs zpool. This runs KVM
One of these guests is alpine with k3s (tried k3os, very limiting in a good way) that allows me to pass a host directory directly into the vm using p9 (tried nfs but lil heavy)
So any storage needs get the benifit of regular snapshots, compression and sync to nas.
Really wish it would be more know that k8 single node is all most people need to get started!
Actual kubernetes deployments and services etc deployed with ansible managed helm
I think this is accurate, but thrre is great educational value when you can see several physical computers working in a cluster, scaling pod on CPU, etc.
I used to have 3 low-power Nuc-style computers, and after playing around for a few months I did the same thing as you have - replaced them with a single beefier machine, and its a lot more practical.
Maybe you could set up multiple K3s hosts and form a cluster with the VMs? Have you taken a look at inlets-operator that I mention in the post for hosting? My favourite party trick is exposing the IngressController's port 80 and 443. https://docs.inlets.dev/#/get-started/quickstart-ingresscont...
There was a user who was paying 7 USD / mo per site to host 20 side-project sides.. expensive. They switched out to a computer under their desk and saved a lot of money that way.
What I'd like to see for change is actually doing the bare metal part itself. I've seen so many k8s showcase posts of this or that, but never actually someone who's running it on actual servers they own and without using any big four cloud API's (I consider Equinix to be part of those too soon...) to handle the LB/Ingress/Network virtualization stuff they provide and still say it is easy to use..
I managed a bare metal cluster of 5x 128 GB RAM for a fintech.
Using bare metal servers without VM layer is actually a simplification. Cutting out a layer that's not strictly necessary.
Test environments were in AWS. There is a load balancer outside of the cluster (highly available HAProxy as a service). I wouldn't say it's particularly difficult or easy. It's pretty cost effective. After the initial setup, scripting and testing, is done, you spend at most a few hours per month with maintenance and the difference in cost of severs is huge. Also, unmetered bandwidth.
The pain points are mostly storage (nothing beats redundant network storage ala EBS) and having to plan at least a few months in advance because you're renting larger chunks of HW.
No network virtualisation, just Calico. Announce the service ips via BGP from each node running the service and ECMP gives you a (poor mans) load-balancing. Ingress gets such a service-ip. I used simply nginx.
Important here though is, that the router needs to be able to do resilient hashing: Removing a node or adding a node otherwise causes a rehash of all connections leading to breaking connections.
I was assuming since tasqa wanted to know, how it works on baremetal in contrast to on the cloud. And since they brought network virtualisation up, that they were already knowledgeable about the networking part.
Networking is handled in kubernetes with CNI plugins, Calico is one of them.
They define how one pod can talk to another.
My simplyfied version: Calico uses the IP routing facilities to route IP packets to pods over hosts. Either from another pod or from a gateway router.
BGP is a protocol to exchange routing information, so it can be used to inform the router or kubernetes nodes (in this case physical hosts) about where to send the IP packets.
If a pod is running on a node, the node announces with BGP that the pod IP can be routed over the IP of the node.
If the pod provides a service (in the kubernetes sense), the node can also announce that the service IP can be routed over the same host.
Now, if two pods on different nodes are providing the same service, then both are announcing the same service IP. So, there are multiple routes or multiple paths for the same IP. That are the last to letters of the acronym ECMP (Equal Cost Multiple Path). Equal cost, because we do not express a preference over one or the other.
The router then can make a decision where to send the packets to. Usually that is done by hashing some part of the IP packet (IP and port of source and target for example).
Now the question is how is that hash deciding to which host it goes?
In most cases it is very simply that you have an array of hosts, and the hash modulo the length gives you the host.
Problem is, if you add or remove one item from that, practically all future packets will end up at a different host than before you did so. And they don't know what to do with it, breaking the connection (in case of TCP).
Resilient hashing describes a feature that the mapping won't change under changes.
You may enjoy the parts on ARP at the end of the post. I am planning a post on HA K3s with etcd on netbooted RPis. The netbooted RPi part is already available for free to my GitHub Sponsors. Here's a gist I put together whilst figuring out how it should look: https://gist.github.com/alexellis/09b708a8ddeeb1aa07ec276cd8...
Not sure what you mean re: "network virtualisation" though?
we scale up to about 100 machines. We use spot instances EXTENSIVELY. And that configuration was tricky actually. Its been a couple of months now. Works pretty ok.
I recently set up a homelab of ~10 k3os+k3s nodes on NUCs. Setting up MetalLB on top of the base k3s installation made exposing services on their own IP addresses pretty dead simple.
same, I tried for my test environment server but there were too many undocumented configuration steps, eventually just went with a single node minikube and that was that, I'd love an article with all the kinks worked out.
For clarity the Control plane High-Availability and Service type LoadBalancer is all provided by kube-vip.io out of the box. The goal is to provide as "close to cloud provider" environment as possible :-)
i have 3 cheapish VPS with public IPs and no VPC . Am but struggling to find a way to get k3s HA working with MetalLB, I dont have EIP /NodeBalancer resource, and dont want to resort to mere 1master+2agents cluster. Any tips or links are appreciated
The post mentions kube-vip. I would be curious to hear peoples experiences with kube-vip. The documentation for kube-vip seems a bit scant unfortunately. Does anyone know how this compares/differs to the MetalLB project?
https://kube-vip.io does two things:
- Control Plane HA w/BGP or ARP
- Service Type: LoadBalancer w/BGP or ARP
It is similar to metallb but has a number of differences under the covers in how it works inside Kubernetes, it is cloud controller agnostic so as long as something attaches an IP to spec.IngressIP then kube-vip will advertise it. For edge deployments loadBalancers addresses can use the local DHCP for addresses etc..
https://kube-vip.io/hybrid/services/ covers a bit of it, the idea is to be decoupled so any on-premises environment can throw together their own CCM that speaks to their local network or IPAM.
Think various boards bootloaders, not even BIOS to rely on, that's running on bare metal. Calling something running on a normal x64 OS launched with UEFI as such is silly and inaccurate.
It's common parlance to use "running on bare metal" as the opposite of "running in a virtual machine", i.e. a synonym of "running your OS without a hypervisor".
This usage has been commonplace for a long time, probably two decades.
I am a professional in the field for many years and your version of the term is not widely accepted. There is a wikipedia page which describes the accepted definition: https://en.wikipedia.org/wiki/Bare-metal_server
Man you guys are laughable, brandishing one wiki page like it's one true source. But nevermind that, several older ones were presented to you and crickets.
If you read again my comment you will see that I made a point about how common the term is. Once again, your version of the definition is just not wide-spread. If you do a google search for "bare-metal", the vast majority of pages refer to the definition that me and the other posted mentioned.
I'm not really sure what people expect gain from these kinds of article, they're great as notes, but it's not something I'd use as a starting point for installing a production Kubernetes cluster.
The initial setup of a Kubernetes cluster is something most HN readers could do in half a day or so. Learning to manage a cluster, that's tricky. Even if you resort to tools like Rancher or similar, you're still in deep waters.
Also why would people assume that there's any difference in installing Kubernetes on an operating system running on physical hardware vs. on virtual machines?