Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Hey y'all, I know getting a setup that feels "right" can be a process. We all have different goals, tech preferences, etc.

I wanted to a share my blog post walking through how I finally built a setup that I can just be happy with and use. It goes over my goals, requirements, tech choices, layout, and some specific problems I've resolved.

Where I've landed of course isn't where everyone else will, but I hope it can serve as a good reference. I’ve really benefited from the content and software folks have freely shared, and hope I can continue that and help others.



How are you finding Nix for the homelab to be? Every time I try it I just end up confused, maybe next time will be the charm.

The reason I ask is I homelab “hardcore”; i.e. I have a 25U rack and I run a small Kubernetes cluster and ceph via Talos Linux.

Due to various reasons, including me running k8s in the lab for about 7 years now, I’ve been itching to change and consolidate and simplify, and every time i think about my requirements I somehow end up where you did: Nix and ZFS.

All those services and problems are very very familiar to me, feel free to ask me questions back btw.


I certainly didn't take to Nix the first few times I looked at it. The language itself is unusual and the error messages leave much to be desired. And the split around Flakes just complicates things further (though I do recommend using them, once you set it up, it's simple and the added reproducibility gives nice peace of mind)

But once I fully understood how it's features really make it easy for you to recover from mistakes and how useful the package options available from nixpkgs are, I decided it was time to sink in and figure it out. Looking at other folks nix config on GitHub (especially for specific services you're wanting to use) is incredibly helpful (mine is also linked in the post)

I certainly don't consider myself to be a nix expert, but the nice thing is you can do most things by using other examples and modifying them till you feel good about it. Then overtime you just get more familiar with and just grow your skill

Oh man, having a 25U rack sounds really fun. I have a moderate size cabinet I keep my server, desktop, a UPS, 10Gig switch, and my little fanless Home Assistant box. What's yours look like?

I should add it to the article, but one of my anti-requirements was anything in the realm of high availability. It's neat tech to play with, but I can deal with downtime for most things if the trade off is everything being much simpler. I've played a little bit with Kubernetes at work, but that is a whole ecosystem I've yet to tackle


>The language itself is unusual and the error messages leave much to be desired. And the split around Flakes just complicates things further

Those are my chief complaints as well, actually. I never quite got to the point where I grasped how all the bits fit together. I understand the DSL (though the errors are cryptic as you said) and the flakes seemed recommended by everyone yet felt like an addon that was forgotten about (you needed to turn them on through some experimental flag IIRC?).

I'll give it another shot some day, maybe it'll finally make sense.

>Oh man, having a 25U rack sounds really fun. I have a moderate size cabinet I keep my server, desktop, a UPS, 10Gig switch, and my little fanless Home Assistant box. What's yours look like?

* 2 UPSes (one for networking one for compute + storage)

* a JBOD with about 400TB raw in ZFS RAID10

* a little intertech case with a supermicro board running TrueNAS (that connects to the JBOD)

* 3 to 6 NUCs depending on the usage, all running Talos, rook-ceph cluster on the NVMEs, all NUCs have a Sonnet Solo 10G Thunderbolt NIC

* 10 Gig unifi networking and a UDM Pro

* misc other stuff like a zima blade, a pikvm, shelves, fans, ISP modem, etc

I'm not necessarily thinking about downsizing but the NUCs have been acting up and I've gotten tired of replacing them or their drives so I thought I'd maybe build a new machine to rule them all in terms of compute and if I only want one host then k8s starts making less sense. Mini PCs are fine if you don't push them to the brim like I do.

I'm a professional k8s engineer I guess, so on the software side most of this comes naturally at this point.


400 TB?! do you collect Linux ISOs or are you doing photography?


Linux ISOs and backups.


(Backups of linux ISOs)


Re: Nix, how do you run a service for which there is not a nix package? Or what if you need to build a service differently from how it's packaged for nix? For instance, you want a build of OpenLDAP with Argon2 or bcrypt support.


You write a nix package, modify other nix packages in nixpkgs, write overlays, and so on. Nix can package any software, build containers or OS images, the sky is not the limit. But first you have to invest (estimated) 2-3 weeks to learn it properly. Hell of a learning curve.


> Oh man, having a 25U rack sounds really fun.

For certain definitions of the word “fun,” yes. I have a 35U (I don’t need that many slots, but at the time I did need it tall enough that my kids couldn’t reach the top, where I put the keys), with:

* 3x Dell R620

* 2x Supermicro (one X9, one X11)

* 1x APC UPS w/ external battery

* Unifi UDM Pro

* Unifi Enterprise 24-port switch

The Dells have Samsung PM863 NVMe drives which are used by Ceph (managed by Proxmox), with traffic sent over an Infiniband mesh network via Mellanox ConnectX3-Pro.

The Dells run K3OS in a VM, which is a dead project. Big mistake there.

The Supermicros have various spinners, and are in a ZFS pool. One of them is technically a backup that should power up daily to ingest snapshots, then power off, but there’s been some issue preventing that, so…

It was all very fun to set up, and has been eminently reliable, but it’s a bit much. While you can in fact make R620s relatively quiet, they’re still 1U, and those little 40mm fans are gonna whine. It’s background noise to me, but guests definitely mention it if we’re close to the rack.

Also, I’m now in the uncomfortable position of being stuck on Proxmox 7, because v8 (or more accurately, the underlying Debian release) dropped support for my HBAs, so the NAS would be dead in the water. I mean, I could compile my own kernel, or perhaps leverage DKMS, but that somewhat defeats the purpose of having a nice AIO like Proxmox. Similarly, my choose of K3OS means at some point I need to spend the time to rip everything out and start over with Talos.

Or - just maybe - I’ve done enough playing, and I should simply buy a JBOD chassis and a relatively new and quiet server (4U under light load means you can get away with much quieter fans), and just run stuff in Docker or gasp systemd. Or, hell, single-node K8s. My point is that it is fun, but eventually your day job becomes exhausting and you tire of troubleshooting all the various jank accumulating at home, and you stop caring about most of it.


I turned my supermicro 847 X10DRI+ into a JBOD by ripping out the mobo, the fans, and building a 140mm fan wall (zip ties and later 3D printed). I can highly recommend the move, it's dead silent and I have drive temps between 33 and 45C with 20 to 25 drives (I honestly don't know the total at this point).

As for your OS issues I also used to run Proxmox with Ceph (btw did you know you can use Proxmox's ceph with rook-ceph so you don't need 2 layers of storage?) including for the router and NAS, but I gave it up due to unwarranted complexity and went bare metal.

Don't know what would fit your particular use case but I can say this: I'm very happy I made a separate box with a supermicro X11 and a JBOD besides it, I can recommend this too; what benefit is there really to virualizing a NAS?

Regarding K3OS you're in luck - kubernetes manifests (you've got GitOps, right?!) are so portable you can just rebuild on a new OS. Give Talos Linux a spin. Again really think about why is it that you're virtualizing here too; maybe you genuinely need it, maybe not.


> Ceph / Rook

Not sure I follow here; why would I want Rook involved? I generally don’t want my orchestration layer - which is also consuming storage - to be involved with the management of said storage.

> Virtualizing vs. bare-metal

It’s partially to make upgrades via new base images easier. I bake images with Packer + Ansible, and so can push a new one out quite easily. The other part is that my NAS consumes very little compute resources, since it isn’t hosting any apps, so it would be a waste to run it on bare metal. Tbf I could run everything consuming the disk storage directly on it, but I had shied away from that initially and it stuck.

> GitOps

I have Helm templates, no ArgoCD. It’s been a TODO for me for quite some time. Same with Talos (I actually do have it running in parallel right now, just not hosting anything). My issue is that I am obsessed with getting things perfect, and for me that means bootstrapping from a bare VM to ArgoCD ingesting manifests and spinning up pods, all automated. I know this is possible, as I’ve seen it done, but I rarely have the time or energy to pursue it after work. I should probably get over myself and just manually install stuff so it’s functional.


I've been trying to switch my home cluster from Debian + K3s to Talos but keep running into issues.

What does your persistent storage layer look like on Talos? How have you found it's hardware stability over the long term?


>What does your persistent storage layer look like on Talos?

Well, for its own storage: it's an immutable OS that you can configure via a single YAML file, it automatically provisions appropriate partitions for you, or you can even install the ZFS extension and have it use ZFS (no zfs on root though).

For application/data storage there's a myriad of options to choose from[0]; after going back and forth a few times years ago with Longhorn and other solutions, I ended up at rook-ceph for PVCs and I've been using it for many years without any issues. If you don't have 10gig networking you can even do iSCSI from another host (or nvmeof via democratic-csi but that's quite esoteric).

>How have you found it's hardware stability over the long term?

It's Linux so pretty good! No complaints and everything just works. If something is down it's always me misconfiguring or a hardware failure.

[0] https://www.talos.dev/v1.11/kubernetes-guides/configuration/...


> after going back and forth a few times years ago with Longhorn and other solutions, I ended up at rook-ceph for PVCs

Curious to know what issues you ran into with Longhorn.


It was years ago but I recall high CPU usage being an issue in particular.

In general it's just not as battle tested as ceph and I needed something more bulletproof.

However I will say this: I'm sure that issue with the CPU usage was fixed (I was watching the GitHub issue) and you might not need your distributed FS to be CERN ready for your lab; AND the UI and built-in backups Longhorn offers are great for beginners so I'd suggest giving it a try if you don't already know you want ceph or OpenEBS Mayastor for the performance and so on.


Thanks! I haven’t had to implement replicated storage (still using EFS) but I was curious about Longhorn.


Talos is the Linux kernel at heart, so.. just fine.


Honestly, I personally like the combination of keepalived+docker(swarm if needed)+rsync for syncing config files. keepalived uses VRRP, which creates a floating IP. It's extremely lightweight and works like a charm. You won't even notice the downtime, the switch to another server IP is instant


Keepalived is great. Learning about it was one of the best things I got from building HA-aiming infra at a job once.


Did you come across or consider using coolify at any point? I've been using it for over a year and quite enjoyed it for it's Heroku type ease of use and auto deployments from GitHub.

https://coolify.io/


No I haven't heard of it before. I do like the idea though, especially for side projects. Thanks for sharing, I'll look more at it!


Check out Dokploy too! https://dokploy.com/


Their license is still ambiguous, and I don't like how they communicate with those who inquire about it.


Hi! Really excited by your work! I'm working on a similar project built on NixOS and curious what you thing.

My goal is to have a small nearly zero-conf apple-device-like box that anyone can install by just plugging it into their modem then going through a web-based installation. It's still very nascent but I'm already running it at home. It is a hybrid router (think OPNSense/PFSense) + app server (nextcloud, synology, yunohost etc). All config is handled through a single Nix module. It automatically configures dynamic DNS, Letsencrypt TLS certs, and subdomains for each app. It's got built in ad blocking and headscale.

I'm working on SSO at the moment. I'll take a look at your work and maybe steal some ideas.

The project is currently self-hosted in my closet:

https://homefree.host


Oh that sounds really rad! Certainly could have it's use cases. I really appreciate how NixOS enables projects like this. Best of luck with it!


Do you use encrypted ZFS?

I have dabbled before with FreeIPA and other VMs on a Debian host with ZFS. For simplicity, I switched to running Seafile with encrypted libraries on a VPS and back that up to a local server via ZFS send/receive. That local server switches itself on every night, updates, syncs and then goes into sleep again. For additional resiliency, I'm thinking of switching to ZFS on Linux desktop (currently Fedora), fully encrypted except for Steam. Then sync that every hour or so to another drive in the same machine, and sync less frequently to a local server. Since the dataset is already encrypted, I can either sync to an external drive or some cloud service. Another reason to do it like this is that storing a full photo archive within Seafile on a VPS is too costly.


Yes! On top of the data safety features of ZFS, the fact you can encrypt a dataset and incremental send/receive is a fantastic ability


I appreciated the in depth look and while some ideas from your setup will take more time to implement, I just added flame for the dashboard and see how it fares with family.


Thank you! It's all a journey, hope flame works well for you!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: