Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Bare-Metal Kubernetes with K3s (alexellis.io)
190 points by alexellisuk on Dec 22, 2020 | hide | past | favorite | 133 comments


Getting Kubernetes up and running isn't really the issue anymore, that's pretty easy to do. The tricky part is long term maintenance and storage.

I'm not really sure what people expect gain from these kinds of article, they're great as notes, but it's not something I'd use as a starting point for installing a production Kubernetes cluster.

The initial setup of a Kubernetes cluster is something most HN readers could do in half a day or so. Learning to manage a cluster, that's tricky. Even if you resort to tools like Rancher or similar, you're still in deep waters.

Also why would people assume that there's any difference in installing Kubernetes on an operating system running on physical hardware vs. on virtual machines?


> Getting Kubernetes up and running isn't really the issue anymore, that's pretty easy to do. The tricky part is long term maintenance and storage.

This times 100. Deploying basic clusters is easy. Keeping a test/dev-cluster running for a while? Sure. Keeping production clusters running (TLS cert TTLs expiring, anyone?), upgrading to new K8s versions, proper monitoring (the whole stack, not just your app or the control-plane), provisioning (local) storage,... is where difficulties lie.


I’m working on this right now. My theory is that having every cluster object defined in git (but with clever use of third party helm charts to reduce maintenance burden) is the way to go.

Our cluster configuration is public[1] and I’m almost done with a blog post going over all the different choices you can make wrt the surrounding monitoring/etc infrastructure on a Kubernetes cluster.

[1] https://github.com/ocf/kubernetes


My comment above did not come out of the blue, but based on real-world experience ;-) You may be interested in our MetalK8s project [1] which seems related to yours.

[1] https://github.com/scality/metalk8s


This looks really cool. Would you include dev apps in the mono repo too? Say you have 40 services by 20 different teams.


If I already have an production cluster up and running for a while, how can I get it git tracked now?

Another question: how about things like autoscaler which automatically edits numbers in deployments, how to git track that?


first of all with k3s keeping a production cluster running is still pretty easy.

second you should always be ready to start from scratch, which is also pretty simple, because of terraform.

a lot of people are scared of k8s but they did not even try. they prefer to maintain their scary ansible/puppet whatever script that works only half as good as k8s.


> first of all with k3s keeping a production cluster running is still pretty easy.

Fair enough. I'll admit I have no direct experience with K3s. There are, however, many K8s deployment systems out there which I would not consider 'production-ready' at all even though they're marketed that way.

> second you should always be ready to start from scratch, which is also pretty simple, because of terraform.

That may all be possible if your environment can be spawned using Terraform (e.g., cloud/VMWare environments and similar). If your deployment targets physical servers in enterprise datacenters where you don't even fully own the OS layer, Terraform won't bring much.

> a lot of people are scared of k8s but they did not even try. they prefer to maintain their scary ansible/puppet whatever script that works only half as good as k8s.

We've been deploying and running K8s as part of our on-premises storage product offering since 2018, so 'scared' and 'didn't try' seems not applicable to my experience. Yes, our solution (MetalK8s, it's open source, PTAL) uses a tech 'half as good' as K8s (SaltStack, not Ansible or Puppet) because you need something to deploy/lifecycle said cluster. Once the basic K8s cluster is up, we run as much as possible 'inside' K8s. But IMO K8s is only a partial replacement for technologies like SaltStack and Ansible, i.e., in environments where you can somehow 'get' a (managed) K8s cluster out of thin air.


I've been using this terraform provider quite a lot lately. It has made it a cinch to templatize a full manifest and pass data to the template for a deploy. We now have a fully reproducible base EKS cluster deploy done with working cert-manager/letsencrypt, nginx ingress, weave-net, fluentd logging to elasticsearch service, etc. Our application-specific code lives in a different repo and deploys things using YTT. It's so much more elegant than our old method of copying and pasting manifests and crossing our fingers and hoping the cluster didn't fall down. A full migration to a new cluster and deploy of a whole new application stack takes under an hour now.

https://github.com/gavinbunney/terraform-provider-kubectl


This is where Uffizzi is going. Phase 1 they started with their own custom controller and they are managing clusters for customers who pay for a portion of that cluster. In Phase 2 they are opening up their control plane to enterprise customers to solve the “times 100” management issue listed above.


I was one of those backwards people who opposed containers, until late 2019.

My start was the official kubernetes docs, step by step. Try everything, write it down, make ansible playbooks. Even used some of their interactive training modules, but I quickly had my own cluster up in vagrant so didn't really need the online shell.

Now we have two clusters at work, I have ansible playbooks I'm really happy with that help me manage both our on-prem clusters and my managed LKE with the same playbooks.

I'm completely sold on this container thing. :)


>Now we have two clusters at work

Running your two tightly coupled distributed monoliths of your front and back office systems no doubt!


Heh no single project, staging and prod. In fact prod isn't ready, that's when I need to start learning HA control plane.


Do the playbooks rely on kubeadm internally for setup?


Are they OSS by any chance? And what kind of tasks do they do?


Yes but I won't link them.

They are focused only on CentOS 7, you might be able to find them yourself. As far as I can tell they include everything except persistent storage and HA control plane.


Just use Kubespray, not much sense in wasting time writing your own unless your intention is specifically learning how to manage cluster setup with stateless Ansible


I actually looked at kubespray first but first of all it did not work out of box as promised. And to troubleshoot it you basically had to know about kubernetes already.

So it made a lot of sense to start at the official docs. And while I'm reading the docs, why not build the cluster at the same time. And while I'm building the cluster, why not write each step down in Ansible so I won't have to repeat myself.

So end result is my own ansible setup that I'm happy with and I know inside and out.

It always bothered me to run other people's ansible playbooks. I'm too much of a control freak.


When people talk about “bare metal” k8s they mean on-prem without the support of integrated cloud infrastructure. All the external stuff like load balancers, ingress controllers, routing, bgp, public ip pools, shared storage, vm creation for scaling, are all things the cluster can’t do for itself and have to be implemented and integrated.


Shared storage being a particular head-f to get solved. Everything else is a reasonably approachable problem you can tackle as it comes.


https://rook.io/ is a really great way to get shared storage up quickly and simply.


I'm still looking for a good resource as someone who's taken their eye off the ball for a decade or so. Things have (and continue to) move at an astounding pace...and I'm needing something that goes from 'Pets and autoconf' to where we are now.


It's still using the same old stack, just containerized. For example, you're no longer manually editing nginx conf, but instead writing the config as yaml annotation to tell the nginx ingress what to do. Instead of editing wp-config.php when deploying wordpress, you specify the db config as environmental variables in yaml file, etc.


Can you explain why one would use Kubernetes? I know containerization but still haven’t figured out wtf Kubernetes is. Everyone and their mother keeps telling me it’s an orchestration tool for containers. It’s as helpful as saying a Pecan pie is an orchestration of flour, sugar and pecans.


Orchestration systems like Kubernetes, Amazon ECS, and (formerly, it’s pretty dead) Mesos run on a cluster of systems and coordinate over the network. What they’re coordinating is running your containerized services, for example your API server, micro services, etc. So you can tell the orchestration service “run 5 copies of this and listen externally on port 9000. When a request comes in on external_ip:9000, redirect it to one of the 5 service instances.” They will also restart your services if they die, keep stdout/stderr logs, and so forth. You can have very complex setups but most will never need more than what I’ve just described.

The main difference between the different orch. tools is how they pipe the data between Outside and the service cluster. Mesos didn’t help you, you had to build your own service discovery. ECS uses Amazon load balancers and some custom black-box daemons. Kubernetes famously uses a complex mesh of “ingress controllers” and kernel iptables/ipfw spoofing. All take their configurations in some form of JSON or YAML.


Thank you.

Is there some sort of a PID-feedback-loop control that monitors the CPU/memory load and helps spin up more instances if it sees more traffic? If Kubernetes doesn't do that, what piece of software can help automatically scale if there is a huge traffic spike?

Load balance AFAIK doesn't do that. It just helps distribute the load.


Using a HorizontalPodAutoscaler [1] you can scale up and down the amount of "pods" (a grouping of containers that are scheduled as a unit) based on the desired metric.

[1]: https://kubernetes.io/docs/tasks/run-application/horizontal-...


Autoscaling is a field of its own, yes. Lots of footguns, though.


Kubernetes is a multi-physical-machine operating system. In a typical OS you have CPU, RAM, etc. and the kernel decides how much of each to allocate to each running task. Kubernetes works the same way, but across multiple computers by default- and it can additionally manage any other arbitrary resource through the use of CustomResourceDefinitions (IPs, higher level concepts like “Minecraft Servers”, etc.)


If you want a (better) cooking analogy - if a container is a 'station' or a line cook in a professional kitchen, then Kubernetes is 'running the line', managing the kitchen, making sure everyone has the tools and ingredients they need.


"Orchestration" means it will manage how many of each container you have running, configuring, starting, and stopping them and possibly migrating them from one machine to another.

The term was created for virtual machines.


if you know systemd, one of its many functions is a single stop systemd controller for all of your infrastructure. Then it also gives you a single way to control storage, monitoring, load balancing and deployment from a single API.

It's neat. It is a lot of moving parts tho. I am just now trying it in a big infrastructure because we have so many bespoke parts that we have to glue together that we might as well try to use what is standard now...


>For example, you're no longer manually editing nginx conf,

You now have to do the configuration in YAML, which is MUCH MUCH worse.


Or more accurately, you now have to understand both the nginx.conf file format, and the right way to nudge the YAML so it produces what you want. Plus some intermediate wrapper script that sets things up on container start that does more funky things based on environment variables.


I don't get it, how do you configure nginx with a YAML file?


You don't directly. You add some extra lines of yaml to an k8s ingress resource, nginx detects this and updates itself. A single nginx container can service many ingresses for each of your apps. The idea is to distribute the config to the app manifests they are related to


> Getting Kubernetes up and running isn't really the issue anymore, that's pretty easy to do.

Care to share an "easy" recipe, because I haven't found one that actually works. It always falls apart for me with the networking.

Let's say I have a cluster of 8 physical nodes and a management node available in my data center and I want to set up k8s for use by my internal users. I'm a solo admin with responsibilty for ~250 physical servers so any ongoing management necessary will be very much a task among many.

Is this blog post a good guide? Is there a better one?


An 'easy' way to deploy a cluster could be using kubeadm. Then you'll need a CNI like Calico to get Pod networking up-and-running. However, you'll want to install a bunch of other software on said cluster to monitor it, manage logs,...

Given you're running on physical infrastructure, MetalK8s [1] could be of interest (full disclosure: I'm one of the leads of said project, which is fully open-source and used as part of our commercial enterprise storage products)

[1] https://github.com/scality/metalk8s


Is it just me, or does this need a QA pass?

Using the 2017 instructions went about like you'd expect in this space, with everything moving as fast as it does.

Using the instructions here has it complaining "Terraform initialized in an empty directory!"

mucking about and thinking the main.tfvars file was a typo and it wanted main.tf got things a little further along then generated another error because it WAS meant to be main.tfvars...

And at this point I'm frustrated, confused, and no closer to understanding the concepts...


I don't entirely understand what's happening here. The title and the post talk a lot about "Bare-Metal", but it also seems to indicate that everything is hosted on servers running Ubuntu. Which, if accurate, seems to be Kubernetes running on Linux, not bare-metal?


Kubernetes relies on containers, which are made out of cgroups and namespaces. Those are Linux features, so there's no way around that.

This is as bare-metal as it gets since it's not running on a VM.


Should we unpack these statements though?

How much of linux isn't about cgroups and namespaces? Docker I believe needs about a 100 system calls to get containers to work. How much of the tree could you shake and still have containers work? Would you still call the host Linux, or something else? And what would that system do for Windows and OS X users? Anything?

Could you maintain this 'something else' as a permanent fork, a la Red Hat?


Point here is that term bare-metal cannot apply to applications relying on OS features. Unikernels are the closest thing you can use this term in this area.


Wikipedia [0] doesn't agree with you (neither do I, but who am I anyway?):

"A bare-metal server is a computer server that hosts one tenant, or consumer, only.[1] The term is used for distinguishing between servers that can host multiple tenants and which utilize virtualisation and cloud hosting.[2] Such servers are used by a single consumer and are not shared between consumers. Each server may run any amount of work for a user, or have multiple simultaneous users, but they are dedicated entirely to the entity who is renting them. Unlike servers in a data centre, they are not being shared between multiple customers.

Bare-metal servers are physical servers. Each server offered for rental is a distinct physical piece of hardware that is a functional server on its own. They are not virtual servers running in multiple pieces of shared hardware."

[0] https://en.wikipedia.org/wiki/Bare-metal_server


See https://en.wikipedia.org/wiki/Bare_machine. Quoting the important part:

> In computer science, bare machine (or bare metal) refers to a computer executing instructions directly on logic hardware without an intervening operating system.

Bare-metal server has indeed been used to refer to non-virtualized servers, but it's a misnomer. The current terminology for this is "dedicated server".


I've never heard of that before. Besides, dedicated server is an unfortunate term in practice either; I may pay Hetzner for a bunch of dedicated servers, some run a bunch of VMs, some run Kafka without a virtualization layer. My services in containers on Kubernetes in VMs on dedicated servers are still running on dedicated servers, but definitely not what I'd call bare metal. In my experience, calling the Kafka server "bare metal" is understood by everyone, used by pretty much everyone I've come across and doesn't get mixed up with adjacent concepts so easily. Maybe that's different in niche/research circles where "bare machine" is a concept that's actually relevant in day-to-day work, I wouldn't know. As far as I'm aware, the term is very much alive and kicking.


I think you'll find that hardware/embedded development is not a niche industry.


A misnomer repeated a thousand times by a thousand different people becomes a/the new meaning. It's how human languages work and evolve or change over time.

Today when most tech people talk about bare metal they refer to a server that is not virtual.


It is still insignificant compared to the original terminology, which is heavily used in hardware/embedded to distinguish between the use of an OS (embedded RTOS, Linux, etc.) vs. direct programming.


I wouldn't put much stock in page made just in 2016 that started like this https://en.wikipedia.org/w/index.php?title=Bare-metal_server...


It has to start with something. It would be a lot weirder if the first revision had been a white paper.

If you disagree with what's on the page, please add or edit the information there. With proper references, it will be appreciated by everyone that uses Wikipedia.


I'm saying the term is much older than this, don't be obtuse.


Most terms are much older than their corresponding Wikipedia pages. I'm really failing to see your point here.


This is not how I've heard any of my coworkers define bare-metal. Obviously anecdotal, but I think you are in the minority here.


Turns out, the minority is right FWIW.

But you still can't stop language from changing no matter how "right" you are you.


Seems like a wasted term for something only 10 people in the world will care about doing.


Even Apple named their API Metal in reference to existing phrases like "running close to the metal" back in 2014 and even then it raised some eyebrows in my opinion, but that usage is still miles better than basically saying you're running a program normally. Since when absence of virtualization requires articles to be written about?


It is possible to run containers directly on top of type 1 hypervisors.


If you mean something like runV which runs Linux kernel in a lightweight VM still then how does that move the needle in the bare-metal direction exactly?


It appears there's two camps in the audience. The camp I'm familiar with (and it appears the author as well) use bare-metal to mean running on your own computers (either in-house or colo) instead of in a VM on a cloud provider.


Just call it physical or dedicated servers.


Everyone is right. Each group have their own background or silo or bubble or industry and each can use the word. Nobody can claim exclusive ownership unless they have a trademark. The one that still boggles my little mind is "Apple". How a company can own exclusive rights to that I will never understand, regardless of how many times a lawyer explains it to me.


I am yet to come across usage where the context doesn't make it unambiguous what it means. It can mean both things at the same time.


I agree, I'm just saying that both sides are understandably confused because its clear that both sides were unfamiliar with the other meaning.


Bare-metal refers to running on hardware directly (vs. running on VMs or the cloud). I'm interested to know what you thought it was.


Traditionally “bare-metal” would refer to running directly on physical hardware with no operating system. In the context of cloud providers, “metal” does now seem to mean “doesn’t run in a VM” but to varying extents. Seems like an unnecessary overload.


In the dark ages, when running an OS in the way you're referring to 'bare metal' was the default, it meant running something without a traditional kernel underneath it.

Not many things are written to do that, of course. Oracle used to offer an installation mode like this. It was generally a gimmick - you pay for a tiny bit of performance with a ton of flexibility. There are probably use cases where it makes sense, but not that many.


Yeah, but it was a super niche term back then. After 2010, at least, bare metal was used a lot more for: "running the OS we need (+ app) directly on a physical server".

This usage has been, in my experience, a lot more widespread.


> Oracle used to offer an installation mode like this

Oracle, and BEA before them, used to offer a JVM which ran on top of a thin custom OS designed only to host the JVM, you could call it a "unikernel". Product was called JRockit Virtual Edition (JRVE), WebLogic Server Virtual Edition (WLS-VE, when used to run WebLogic), earlier BEA called it LiquidVM. The internal name for that thin custom OS was in fact "Bare Metal". Similar in concept to https://github.com/cloudius-systems/osv but completely different implementation

I think one thing which caused a problem for it, is a lot of customers want to deploy various management tools to their VMs (security auditing software, performance monitoring software, etc) and when your VM runs a custom OS that becomes very difficult or impossible. So adopting this product could lead to the pain of having to ask for exceptions to policies requiring those tools and then defending the decision to adopt it against those who use those policies to argue against it. I think this is part of why the product was discontinued.

Nowadays, Oracle offers "bare metal servers" [1] – which are just hypervisor-less servers, same as other cloud vendors do. Or similarly, "Oracle Database Appliance Bare Metal System" [2] – which just means not installing a hypervisor on your Oracle Database Appliance.

So Oracle seems to have a history of using the phrase "bare metal" in both the senses being discussed here.

[1] https://www.oracle.com/cloud/compute/bare-metal.html

[2] https://docs.oracle.com/en/engineered-systems/oracle-databas...


Hmm - what's the overlap between your definition of "bare metal" and the current definition of "embedded"?

I will say, this comment section is the first time I'm hearing about "bare-metal" meaning "without an OS", but the above question is genuine curiosity.


Those terms are orthogonal. Embedded typically refers to running on some HW that is not typically thought of as a computer. Embedded SW can run within an OS or on bare metal.


Directly on the hardware. It's the only definition of the term that I'm familiar with. Like the difference between writing your application and managing lifecycle and peripheral access all yourself directly to your MCU vs. using an RTOS to provide you facilities for task scheduling and I/O primitives, etc.

I've never before encountered, "Includes a full feature-rich OS" as "bare-metal" before. Reading the title I assumed someone managed to get some flavor of Kubernetes running right on the hardware as the lowest-level software layer of the system. That would have meant bare-metal to me. What's described here is running Kubernetes on a physical host rather than a virtual host from what I can tell, but it's not running Kubernetes "bare-metal" because between Kubernetes and the "metal" is Linux.

Or at least that's what it would mean in my world, but the interpretation appears to be different for others. Outside of confusion, I was also just disappointed. This article is just basically setting up Kubernetes. That it's on a physical host is a lot less interesting and novel to me than if they'd managed to implement some shape of Kubernetes as the OS itself, which is what I'd originally interpreted the title to mean.


Im surprised you've been downvoted as much as you have. It appears bare metal has a very specific meaning in certain contexts. For what its worth, it also has a very specific meaning in cloud provider contexts, which is exactly what you've defined here.


to me, running on "bare metal" means part of your program is setting up the clock for various buses and CPU and you have another little program who's job is to jump to the first address of your real program.


https://en.wikipedia.org/wiki/Bare_machine

> In computer science, bare machine (or bare metal) refers to a computer executing instructions directly on logic hardware without an intervening operating system.



Yes, I mentioned elsewhere that this has been used to refer to dedicated servers, but it's a marketing misnomer.


Running Kubernetes on Ubuntu is a bad idea especially if it is multitenant. You can have a lot of security issues there as users can run anything on those nodes. This exercise is far from production


What's the problem with Ubuntu specifically (vs ex. Alpine or RHEL)?


I guess "bare-metal" is Newspeak for "running directly on hardware without a hypervisor".


It's not "Newspeak". Kubernetes is an application, it runs on an OS. You can call it "Newspeak" if you consider anything after VMs became super popular "Newspeak" (around 2010, I think?).

Bare-metal = no VMs or other virtualization involved.


It is somewhat "newspeak", as bare metal has meant "without an OS" in the embedded space for a very long time. This is just a case of two different spaces using the same term for different ideas.


Bare metal has meant "whithout a supervisor" to operations for more than a decade. It has also meant "without an emulator" to the emulation community for a really long time.

I imagine it also has some meaning for the music community.


Sure. For comparison, I can find references to "bare metal" meaning "no operating system" dating back to 1989...3 decades:

https://apps.dtic.mil/dtic/tr/fulltext/u2/a219356.pdf (search for bare-metal)

I suspect it comes from the automotive paint industry. Sanding down to the "bare metal" for the best finish...where the primer and paint are as close to the substrate as they can be.


Sometimes terms are overloaded for different domains. This is pretty common; no need to argue about which is The One True Definition.


Hence, "just a case".


But that's not (Orwellian) newspeak at all, is it? That term carries some very strong negative connotations.


I did couch it with "somewhat". Though it does fit the theme of a diminished meaning for the same word, since the newer meaning is a much simpler thing to do.


You mean it's Newspeak for "running under a supervisor, but without the supervisor running under a hypervisor".


I actually just got a NUC with 64gb ram. Alpine installed to encrypted NVMe zfs zpool. This runs KVM

One of these guests is alpine with k3s (tried k3os, very limiting in a good way) that allows me to pass a host directory directly into the vm using p9 (tried nfs but lil heavy)

So any storage needs get the benifit of regular snapshots, compression and sync to nas.

Really wish it would be more know that k8 single node is all most people need to get started!

Actual kubernetes deployments and services etc deployed with ansible managed helm


I think this is accurate, but thrre is great educational value when you can see several physical computers working in a cluster, scaling pod on CPU, etc.

I used to have 3 low-power Nuc-style computers, and after playing around for a few months I did the same thing as you have - replaced them with a single beefier machine, and its a lot more practical.


Totes agree, I think I got the vms set up once a long time ago before k3s and it was some real work


Maybe you could set up multiple K3s hosts and form a cluster with the VMs? Have you taken a look at inlets-operator that I mention in the post for hosting? My favourite party trick is exposing the IngressController's port 80 and 443. https://docs.inlets.dev/#/get-started/quickstart-ingresscont...

There was a user who was paying 7 USD / mo per site to host 20 side-project sides.. expensive. They switched out to a computer under their desk and saved a lot of money that way.


That is the plan eventually so that I can do node upgrades with lil downtime


Can you explain what p9 is?


A network filesystem developed for the Plan 9 operating system https://en.wikipedia.org/wiki/9P_(protocol)


When you create or a edit a vm you can add a file system passthrough.

You mount this in your guest vms fstab and specify 9p in the options


What I'd like to see for change is actually doing the bare metal part itself. I've seen so many k8s showcase posts of this or that, but never actually someone who's running it on actual servers they own and without using any big four cloud API's (I consider Equinix to be part of those too soon...) to handle the LB/Ingress/Network virtualization stuff they provide and still say it is easy to use..


I managed a bare metal cluster of 5x 128 GB RAM for a fintech.

Using bare metal servers without VM layer is actually a simplification. Cutting out a layer that's not strictly necessary.

Test environments were in AWS. There is a load balancer outside of the cluster (highly available HAProxy as a service). I wouldn't say it's particularly difficult or easy. It's pretty cost effective. After the initial setup, scripting and testing, is done, you spend at most a few hours per month with maintenance and the difference in cost of severs is huge. Also, unmetered bandwidth.

The pain points are mostly storage (nothing beats redundant network storage ala EBS) and having to plan at least a few months in advance because you're renting larger chunks of HW.


Where do you see the difficulty?

I've installed k8s with ansible on baremetal (kubespray), more or less just followed the steps here: https://kubernetes.io/docs/setup/production-environment/tool...

No network virtualisation, just Calico. Announce the service ips via BGP from each node running the service and ECMP gives you a (poor mans) load-balancing. Ingress gets such a service-ip. I used simply nginx.

Important here though is, that the router needs to be able to do resilient hashing: Removing a node or adding a node otherwise causes a rehash of all connections leading to breaking connections.


I guess you don't even realize how cryptic is your post for someone uninitiated :)

Calico? Network virtualization? BGP? ECMP? Resilient hashing?

No big surprise all this stuff is easy for you.


I was assuming since tasqa wanted to know, how it works on baremetal in contrast to on the cloud. And since they brought network virtualisation up, that they were already knowledgeable about the networking part.

Networking is handled in kubernetes with CNI plugins, Calico is one of them. They define how one pod can talk to another.

Probably best described in how it does it is by the project itself: https://docs.projectcalico.org/about/about-networking

My simplyfied version: Calico uses the IP routing facilities to route IP packets to pods over hosts. Either from another pod or from a gateway router.

BGP is a protocol to exchange routing information, so it can be used to inform the router or kubernetes nodes (in this case physical hosts) about where to send the IP packets.

If a pod is running on a node, the node announces with BGP that the pod IP can be routed over the IP of the node. If the pod provides a service (in the kubernetes sense), the node can also announce that the service IP can be routed over the same host. Now, if two pods on different nodes are providing the same service, then both are announcing the same service IP. So, there are multiple routes or multiple paths for the same IP. That are the last to letters of the acronym ECMP (Equal Cost Multiple Path). Equal cost, because we do not express a preference over one or the other.

The router then can make a decision where to send the packets to. Usually that is done by hashing some part of the IP packet (IP and port of source and target for example).

Now the question is how is that hash deciding to which host it goes? In most cases it is very simply that you have an array of hosts, and the hash modulo the length gives you the host. Problem is, if you add or remove one item from that, practically all future packets will end up at a different host than before you did so. And they don't know what to do with it, breaking the connection (in case of TCP). Resilient hashing describes a feature that the mapping won't change under changes.


You may enjoy the parts on ARP at the end of the post. I am planning a post on HA K3s with etcd on netbooted RPis. The netbooted RPi part is already available for free to my GitHub Sponsors. Here's a gist I put together whilst figuring out how it should look: https://gist.github.com/alexellis/09b708a8ddeeb1aa07ec276cd8...

Not sure what you mean re: "network virtualisation" though?


we are running k3s on metal in production. works great actually. we use haproxy as ingress and lb.


How stable do you see it ? What's your cluster size? How long has it been running? Any tips for how to approach starting such setup?


we scale up to about 100 machines. We use spot instances EXTENSIVELY. And that configuration was tricky actually. Its been a couple of months now. Works pretty ok.

k3s is actually pretty simple to use now. the tricky part was to integrate with https://github.com/kubernetes/cloud-provider-aws and https://github.com/DirectXMan12/k8s-prometheus-adapter

The hardest part is to get it to work with spot instances. we use https://github.com/AutoSpotting/AutoSpotting to integrate with it.


Wow! Impressive... what are the advantages over EKS? Just cost benefits or others as well?


We want to have the exact same stack running on laptops and cloud. That's the main goal. Everything else is secondary.


The HA control plane and service type LB is all handled by https://kube-vip.io and is designed to be as transparent to the user as possible.


I recently set up a homelab of ~10 k3os+k3s nodes on NUCs. Setting up MetalLB on top of the base k3s installation made exposing services on their own IP addresses pretty dead simple.


I've seen a couple presentations by Chick-fil-a explaining things they've encountered doing this. You can find them on youtube, and here is an article: https://medium.com/@cfatechblog/bare-metal-k8s-clustering-at...


same, I tried for my test environment server but there were too many undocumented configuration steps, eventually just went with a single node minikube and that was that, I'd love an article with all the kinks worked out.


For clarity the Control plane High-Availability and Service type LoadBalancer is all provided by kube-vip.io out of the box. The goal is to provide as "close to cloud provider" environment as possible :-)


https://metallb.universe.tf/ is also worth checking out


thanks Alex for all your work in k8s ecosystem!

i have 3 cheapish VPS with public IPs and no VPC . Am but struggling to find a way to get k3s HA working with MetalLB, I dont have EIP /NodeBalancer resource, and dont want to resort to mere 1master+2agents cluster. Any tips or links are appreciated


Are you able to use BGP or ARP as suggested in the post? Does your VPS provide have a managed LoadBalancer option?

I would probably suggest you go for 3x servers in HA, it uses marginally more memory but can tolerate a failure.

You can always go and use DigitalOcean managed K8s, or self-host at home using inlets that I mention in the post.


The post mentions kube-vip. I would be curious to hear peoples experiences with kube-vip. The documentation for kube-vip seems a bit scant unfortunately. Does anyone know how this compares/differs to the MetalLB project?


Howdy!

https://kube-vip.io does two things: - Control Plane HA w/BGP or ARP - Service Type: LoadBalancer w/BGP or ARP

It is similar to metallb but has a number of differences under the covers in how it works inside Kubernetes, it is cloud controller agnostic so as long as something attaches an IP to spec.IngressIP then kube-vip will advertise it. For edge deployments loadBalancers addresses can use the local DHCP for addresses etc..


>"It is similar to metallb but has a number of differences under the covers in how it works inside Kubernetes"

This was atually the part I was curious about. Could you elaborate? Or is there a design doc somewhere?


https://kube-vip.io/hybrid/services/ covers a bit of it, the idea is to be decoupled so any on-premises environment can throw together their own CCM that speaks to their local network or IPAM.


Would it be feasible to adapt this configuration to use just 3 machines total, with the control plane and workers running on the same machines?


That's not what bare-metal means.


What does it mean, in your opinion?


https://wiki.c2.com/?CloseToTheMetal

Think various boards bootloaders, not even BIOS to rely on, that's running on bare metal. Calling something running on a normal x64 OS launched with UEFI as such is silly and inaccurate.


It's common parlance to use "running on bare metal" as the opposite of "running in a virtual machine", i.e. a synonym of "running your OS without a hypervisor".

This usage has been commonplace for a long time, probably two decades.

https://en.wikipedia.org/wiki/Bare-metal_server


I am a professional in the field for many years and your version of the term is not widely accepted. There is a wikipedia page which describes the accepted definition: https://en.wikipedia.org/wiki/Bare-metal_server


Man you guys are laughable, brandishing one wiki page like it's one true source. But nevermind that, several older ones were presented to you and crickets.

https://en.wikipedia.org/wiki/Close_to_Metal

https://en.wikipedia.org/wiki/Bare_machine

https://wiki.c2.com/?CloseToTheMetal

This term goes back to early 2000 at least.


If you read again my comment you will see that I made a point about how common the term is. Once again, your version of the definition is just not wide-spread. If you do a google search for "bare-metal", the vast majority of pages refer to the definition that me and the other posted mentioned.


I saw Terraform and was out, because I don’t have the money to pour into recurring costs.

Can we have nice things without it being a hook to buy something?


How is using terraform something to do with spending money?

If you want to go cheap, you can host at home on old servers or RPis.. there's even links included to help you with that.

https://blog.alexellis.io/test-drive-k3s-on-raspberry-pi/


Terraform is free and opensource?

Or are you making a metacomment about Terraform being painful at times?


Terraform is free and open source. Their paid product is a hosting solution.


What recurring costs are you referring to?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: