What chroot taught me about containers

noufalibrahim · on Nov 9, 2022

Back before the birth of docker when I was consulting, a friends startup wanted a way to install their multi component application (web service, database etc.) inside a corporate environment. They built it on a Debian setup and generally apt-getted (apt-got?) all their requirements, used supervisord to manage the backend process (written in Python) and had it running. To do this on prem, they needed an installer.

I cooked up this idea of creating a dummy file (dd), formatting it into a file system, mounting it as a loopback device and deb-bootstrapping it. Then I chrooted inside it, installed all the dependencies needed, added all the code and things we needed for the application, left the chroot and use makeself to create a self extracting zip file of the whole application along with an initial config dialogue. We wrapped up the process using a Python script so that people create such "installers".

Once you ran the file, it would unzip itself into /opt/{something} and start up all the processes. Inside the chroot was Debian with a db and several other things running. Outside the chroot was Redhat or whatever. It ran surprisingly well and was used for a couple of years for all their customers even after Docker came out. I rejoined them as a contractor after they got a decent amount of funding and the same thing was still in use .

I wrote about this as an answer to a Stack Overflow question back then https://stackoverflow.com/questions/5661385/packaging-and-sh...

It was a proto docker without proper process isolation. Just file system isolation using chroot. Definitely brought back memories. The real value docker added was standardization, a central container registry, tooling etc. It mainstreamed these kinds of arcane old school sysad tricks into a generally usable product.

nonrandomstring · on Nov 9, 2022

Brings back lovely memories of Knoppix days. That's more or less how we used to mod CD based distros, mounted as looped-back filesystems.

Maybe another missing link in the continuum between simple chroot and full VMs is User Mode Linux (UML). Whatever happened to that? Didn't it get folded into Linux kernel as a standard thing at some point? Why do we never hear much of it now?

flyinghamster · on Nov 9, 2022

I haven't really kept up with it, but I didn't find it in a cursory check of 6.0.7. Linode used UML when they were first starting, before eventually moving to Xen and then KVM.

mcculley · on Nov 9, 2022

It is still there: https://github.com/torvalds/linux/tree/master/arch/um

flyinghamster · on Nov 9, 2022

Thanks - I can think of edge cases where it can still be useful (particularly for kernel development). I didn't wade deeply enough into menuconfig to find it.

xmodem · on Nov 9, 2022

There's now also gVisor

goodpoint · on Nov 9, 2022

> I cooked up this idea of creating a dummy file (dd), formatting it into a file system, mounting it as a loopback device and deb-bootstrapping it.

This has been used for 2 decades. It's extremely common for embedded systems.

noufalibrahim · on Nov 10, 2022

I have very little embedded experience. I think I got the general approach from a python script shipped with Ubuntu to create bootable thumb drives but I might be misremembering.

pjmlp · on Nov 9, 2022

On HP-UX we would just use vaults, already in 1999.

hinkley · on Nov 9, 2022

Half of the things we front page are retreads of minicomputer and supercomputer tech, some of it as old as the late 1980’s. I’m wondering what will happen and what sort of retrospectives will occur when the people involved spend enough time researching existing techniques and discover they’re covering old ground.

pjmlp · on Nov 9, 2022

It is going to be the equivalent of rediscovering ancient Roman, Greek, Egyptian, Babylonian, Maya, Aztecs,... technology.

ilyt · on Nov 9, 2022

We did that for some web hosting client it worked but managing this wasn't exactly pleasant. Then again there isn't much managing if instead of upgrading the system you rebuild it from scratch every time.

noufalibrahim · on Nov 10, 2022

Yup. That's the reason why we did this.

yjftsjthsd-h · on Nov 9, 2022

What was the advantage to creating a filesystem in a disk image rather than just extracting a root tarball somewhere and chrooting into it?

coldacid · on Nov 9, 2022

The filesystem-in-a-file was to produce the contents of the root tarball that was extracted somewhere (in /opt per OP) and chrooted into.

noufalibrahim · on Nov 10, 2022

Yes. It was also the first time I used makeself which I quite liked.

akadempythag · on Nov 9, 2022

Fun deep dive. In a previous life I wrote web apps, and I liked to develop on Chromebooks to make sure that everything ran smoothly on low-end machines.

I would set a price point of $2-300, which usually meant I was lucky to get a Rockchip. And this was before it was popular to build for ARM architectures.

And yet, ChromeOS' chroot-based projects like crosh always managed to deliver. Surrounded by aluminum MacBooks and carbon-fiber Dells, I would run Rails apps and X GUIs and source builds on a rubberized hunk of plastic that had been designed for use in schools.

I have always been surprised at how well that worked. If it was using the same fundamentals as Docker and Podman, I'm not surprised that the containerization movement has enjoyed such popularity.

j0hnyl · on Nov 9, 2022

Likewise it's amazing what one can do with Termux on an Android device.

spiffytech · on Nov 10, 2022

There was something really nice about developing on a Chromebook. It felt simplified in a very pleasant way.

I ultimately discontinued the practice because beefy Chromebooks never become commonplace, some points of development friction got tiresome, and I stopped preferring Chrome.

Yet I'm nostalgic for the experience.

nunez · on Nov 9, 2022

Yes, containers are more than chroot, until you:

- Want to give them their own IP addresses or networks, or

- Put upper bounds on their resources, or

- Get tired of dealing with chroot and unshare and seccomp and probably other tools I'm forgetting, or

- You want to run an arm64 container on an x86 host with minimal configuration, or...

this is a really fun (and important) exercise for anyone working with containers seriously to undergo but let's not trivialize how insanely easy Docked made creating containers become.

laumars · on Nov 9, 2022

Making containers was easy long before Docker came along:

- FreeBSD Jails

- Solaris Zones

- Proxmox (which was an abstraction over OpenVZ, back before LXC came along)

In fact because of all of the above, I was a latecomer to Docker and didn't understand the appeal.

What Docker changed was that it made containers "sexy", likely due to the git-like functionality. It took containers from a sysadmin world and into a developers world. But it certainly didn't make containers any easier in the process.

tiagod · on Nov 9, 2022

It did make it easier, at least the barrier to entry. I remember reading about jails years ago, when I had a lot less sysadmin knowledge, and I couldn't wrap my head around it.

With docker, many people still can't wrap their head around how it works and will do stupid things if they need to run them in a serious environment, but they can still run a bunch of containers to run some hard to install software easily on their local machine!

Sure, jails were easy in some ways, but boiling docker's success to sexyness, instead of usefulness, sounds a bit like yet another "Dropbox is just rsync". Docker wasn't solving the isolation issue (which had been obviously solved for years) but mostly the distribution issue.

BiteCode_dev · on Nov 9, 2022

So you mean you could take your FreeBSD Jails configuration, upload it on a well known public website like dockerhub, then get someone on Windows or Mac transparently install the image and run it with a few cmdlines ?

Because docker container are called container for this reason. That comes from the boat container analogy.

clcaev · on Nov 9, 2022

Many years before Linux containers, FreeBSD jails were easily packaged up via tar, deployed via scp, and started with minimal script. There wasn't much hype, it just worked. It was an excellent software packaging and distribution tool.

There wasn't a hub that I recall, nor was there tooling to use VMs so they could run on Windows/Mac. However, the main challenge, being able to distribute an "image" without requiring VM overhead, was solved elegantly. It just wasn't Linux, so it didn't make news.

laumars · on Nov 9, 2022

Running Docker on non-Linux platforms requires a Linux VM to run in the background. It's not as cross platform as people make out. The other container technologies can be managed via code too, that code can be shared to public sites. And you can run those containers in a VM too, if you'd want.

What's more, with ZFS you could not only ship the container as code, but even the container as a binary snapshot, very much like Docker's push/pull but predating Docker. Even on Linux, for a long time before Docker, you could ship container snapshots as tarballs.

Also worth mentioning is that early versions of Proxmox even had (and likely still does) a friendly GUI to select which image you wanted to download, thus making even the task of image selection user friendly. This was more than 10 years ago, long before Docker's first release and something Docker Desktop still doesn't even have to this day.

> Because docker container are called container for this reason. That comes from the boat container analogy.

The term computer "container" predates Docker by a great many years. Containers were in widespread use on other UNIXes with Linux being late to the game. It was one of the reasons I preferred to run FreeBSD or Solaris on production in the 00s despite Linux being my desktop OS of choice. Even when Linux finally caught with containerisation, Docker was still a latecomer to Linux.

Furthermore, for a long time Docker wasn't even containers (still isn't strictly that now but it at least offers more in the way of process segregation than the original implementations did). Albeit this was a limitation of the mainline Linux kernel so I don't blame Docker for that. Whereas FreeBSD Jails and Solaris Zones offered much more separation, even at a network level.

If we are being picky about the term "container" (not something I normally like to do) then Docker is the least "container"-like of all containerisation technologies available. But honestly, I don't like to get hung up on jargon because it helps no-one. I only raise this because you credited the term to Docker.

---

Now to be clear, I don't hate Docker. It may have it's flaws but there are aspects of it I do also really like; and thus I do use it regularly on my Linux hosts these days despite my original reluctance to move away from Jails (Jails is still much nicer if you need to do anything complicated with networking, but Docker is "good enough" for most cases). However what I really dislike is this rewriting of history where people seem to think Docker stood out as a better designed technology - either from a UX or engineering perspective.

I personally think what made Docker successful was being in the right place at the right time. Linux was already a popular platform, containers were beginning to become widely known outside of the sysadmin circles but Linux (at that time) still sucked for containerisation. So it got enough hype early on to generate the snowball effect that saw it become dominant. But lets also not forget just how unstable it was, for a long time it was frequently plagued with regression bugs from one release to another. Which caused a great many sysadmins to groan whenever a new release landed.

(sorry for the edits, the original post was a flow of thoughts without much consideration to readability. Hopefully I've tidied it up)

ilyt · on Nov 9, 2022

If anything this is testament to the failure of previous solutions to popularize it.

Docker invented absolutely zero on the OS side and reused what LXC did but the invention here is not "putting things in containers" but "making it easy to put things in containers" and "making it easy to run those containers. Every solution before that required a bunch more knowledge.

> Running Docker on non-Linux platforms requires a Linux VM to run in the background. It's not as cross platform as people make out.

Which people ? I never seen anyone saying Docker makes it easy to run cross platform stuff, and it was always one of it's pain points.

laumars · on Nov 9, 2022

> If anything this is testament to the failure of previous solutions to popularize it.

Maybe. But I'd rather not argue about popularity in a conversation about technical merit. The two aren't mutually inclusive and popularity is a subjective quality. Nothing good ever comes from conversations about popularity and preference.

> Every solution before that required a bunch more knowledge.

I'm not sure I fully agree with that. Docker has a lot of bespoke knowledge whereas the previous solutions built on top of existing knowledge. Where they differed was that Docker was an easier learning curve for people with previously zero existing systems knowledge. Which is something I didn't really appreciate until reading these responses because (possibly because I'm an old timer developer. I want to understand how my code works at a systems level so made it my job to understand the OS and even hardware too - though that's gotten harder as tech has progressed. But that was expected of developers when I started out).

> Which people ? I never seen anyone saying Docker makes it easy to run cross platform stuff, and it was always one of it's pain points.

The comment I replied to said: "get someone on Windows or Mac transparently install the image and run it with a few cmdlines"

ilyt · on Nov 9, 2022

>> If anything this is testament to the failure of previous solutions to popularize it.

>Maybe. But I'd rather not argue about popularity in a conversation about technical merit. The two aren't mutually inclusive and popularity is a subjective quality. Nothing good ever comes from conversations about popularity and preference.

The popularity is directly related to technical merit of it being very easy to start, both to run and to create containers. It isn't "just" popular, it got popular because it was solution to the problem that near-zero infrastructure skill developer could apply. We have countless example of solutions winning almost purely on having low barrier to entry, and docker was just that for containers.

The previous solutions ignored that, and assumed the target audience is mildly competent sysadmin, not a developer that has no idea what UID is, let alone the rest of ops stuff.

And it got buy in on the other side of the fence too, as now sysadmin instead of installing a spider's net of PHP or Ruby deps just had to install docker and deploy a container.

>> Every solution before that required a bunch more knowledge.

> I'm not sure I fully agree with that. Docker has a lot of bespoke knowledge whereas the previous solutions built on top of existing knowledge. Where they differed was that Docker was an easier learning curve for people with previously zero existing systems knowledge.

Well, you got the point. At the point where you need that knowledge (and I'd argue debugging docker container is in every way harder than just having a process in system) you already bought into the ecosystem. There is no initial hurdle to go thru like there was with previous systems trying to do same thing, even if you end up with harder to debug end result.

Just like with other things, PHP got popular because it was easier than anything CGI related, "just write code inside your HTML", Ruby got popular off Rails and 15 minute blog engine demo, Python being just all around easy to learn.

laumars · on Nov 9, 2022

> The previous solutions ignored that, and assumed the target audience is mildly competent sysadmin, not a developer that has no idea what UID is, let alone the rest of ops stuff.

But that doesn't mean that the previous solutions weren't popular for others outside of the developer community. The comments here are heavily developer orientated but that's only part of the story in terms of the wider container ecosystem.

> Just like with other things, PHP got popular because it was easier than anything CGI related, "just write code inside your HTML", Ruby got popular off Rails and 15 minute blog engine demo, Python being just all around easy to learn.

The point I was making wasn't that "Docker doesn't deserve popularity" nor any confusion as to why it's popular. It was saying that the stuff that came before it was also easy.

Your example about languages here is a apt because PHP is an easier language for people from a zero coding background. But if your background is in C then PHP is going to be much harder to use compared to learning Nim, Zig or Rust.

Saying the containerisation solutions that came before were garbage, as people have done, isn't accurate. I'm not being critical of Docker; I'm defending the elegance of Jails. It's just that elegance is exposed in a different way and for a different audience to who Docker targets.

nunez · on Nov 9, 2022

i think a lot of previous solutions focused on the "5x" engineer that was willing to comb through manpages and dig through the source (or at least the Makefile) if something unexpected happened.

many, many, MANY engineers are not like that.

many just want to build and push their features, and that's fine.

Docker knew that LXC was onto something and focused on the latter audience; that combined with their VC funding after they hit a critical mass is why they are heralded as having "invented" containers (even though they didn't).

_zoltan_ · on Nov 9, 2022

and what I really dislike is dissing a technology because piece of it had existed in other forms before.

yes, zones on solaris offered a lot of the modern SDN networking stuff. was it popular? no.

yes, with zfs, in theory, you could ship a binary file and the other side can load it nicely (if you're thinking send/recv). was it popular to ship things like that in the open, public, in an easy to use fashion? no.

just admit docker popularized a lot of these and let's move along. while the tech might have existed, the previous ecosystems sucked and docker changed this for good.

laumars · on Nov 9, 2022

> and what I really dislike is dissing a technology because piece of it had existed in other forms before.

How am I "dissing" Docker here? The comments before me are saying other technologies weren't easy or as feature rich. I'm saying they were. That's not a criticism of Docker. It's just a fact about other technologies.

> yes, zones on solaris offered a lot of the modern SDN networking stuff. was it popular? no.

Popularity means jack shit about the quality of a product. Reddit, Twitter and Facebook are popular but the UX is appalling on each of them. Just as plenty of really well built technologies never gain traction.

I suspect the reason Zones wasn't popular was because Solaris wasn't popular. Had Zones or Jails existed in the Linux mainline kernel at the same time as they had in Solaris/FreeBSD then we might not have seen a need for Docker. Or maybe it might still be around and popular...who knows? It's pointless to speculate over why something is popular because it's unscientific and unprovable. But we can discuss the UX and capabilities.

> yes, with zfs, in theory, you could ship a binary file and the other side can load it nicely (if you're thinking send/recv). was it popular to ship things like that in the open, public, in an easy to use fashion? no.

I can't speak for others did or did not, but I certainly did. (also see my comment above regarding popularity).

> just admit docker popularized a lot of these and let's move along.

I wasn't arguing that Docker didn't popularise these things. I was arguing against the point that the other tools were sub-par.

> while the tech might have existed, the previous ecosystems sucked and docker changed this for good.

And here's the crux of problem: you're conflating popularity with technical excellence. They're two unrelated metrics.

tinco · on Nov 9, 2022

Just for clarification, because zoltan isn't making very compelling arguments.

Docker made containers easy and effective. Solaris zones were not as good as Docker is, BSD jails are/were not as good or easy to use as Docker is. Popularity has nothing to do with it, except that the popularity is an indication of the fact that Docker was revolutionary in the way it made these technologies accessible to a very large professional audience.

Docker was not created in isolation, it was inspired by jails and zones and all the fancy new features that were added to the linux kernel at the time.

Using just the words FROM, ADD and CMD, you can make a container definition that effectively isolates a runtime for just about any application in a 3 lines. Beyond the couple simple keywords all you need to have is absolutely basic linux knowledge, the level you can teach any developer in an afternoon.

There's no need to pollute that developers mind with any other system administration garbage. Nothing about networking, policies, filesystems, whatever. Just basic bash and a couple keywords.

Then when you want to go to production, you just hand the shit your developers wrote over to a professional system administrator and they'll make it run perfect at any scale. It's magic. Before Docker the world was darkness and bullshit, and after Docker the world was drenched in light and all that is good.

The fact that it's 2022 and there's still people that are going "hur-dur Solaris zones, BSD jails amiright" as if any of those technologies have any relevance is ridiculous.

Docker is technologically excellent.

laumars · on Nov 9, 2022

I was with you right up until this part:

> The fact that it's 2022 and there's still people that are going "hur-dur Solaris zones, BSD jails amiright" as if any of those technologies have any relevance is ridiculous.

Having diversity in the computing ecosystem is a good thing, not bad.

I'll take your point that Docker brought containers to the developers (frankly, I made that point myself) but that doesn't mean that Jails doesn't solve some problems that Docker (currently) struggles with. Nor does Docker's success mean that a little competition isn't healthy for the wider industry.

Dismissing the stuff that went before it as "systems administration garbage" because it was targeted at a different audience to yourself is a really poor attitude in my opinion. Especially when there are countless examples of when audiences different from developers also need to make use of software. Frankly, I thought by now we were past the sysadmin vs developer flamewars. But clearly not.

Aside from that minor rant, I do want to thank for your post. It was an informative read.

tinco · on Nov 9, 2022

My apologies, I meant relevance to the problem that Docker solves, which is enabling developers to neatly specify and package their dependencies. I am not trying to diminish jails and zones usefulness to system administrators. I'm just saying if you put Docker in a comparison list to other technologies, jails and zones wouldn't even be in that list.

The annoyance comes from system administrators looking at the set of technologies inside Docker and saying "we already have that", and then just assuming Docker must be some sort of marketing scheme. I deployed docker in my organisation within a week of its first (beta?) release, when all of its "marketing" was a single blog post.

Docker solved an enormous real problem in the software industry, even if from a system administrators perspective it's just a new way of packaging applications, as there have been many in the past and probably will be many in the future.

laumars · on Nov 9, 2022

Oh I never meant any of my comments to undermine Docker. While I do have some specific frustrations with Docker, the same is true with any technology stack: Jails and Zones included.

I'd never describe Docker as being a marketing gimmick. It was definitely a "right time, right place" tool. But that speaks more about how the market (and particularly Linux) was yearning for something better.

Thanks for the interesting conversation :)

brazzledazzle · on Nov 9, 2022

Docker is the best because it did something its predecessors could not: made it accessible, easy to run and easy to share. On Linux. Now the technology has industry standardization and so much inertia it’s surviving the monetization drive by docker (the company). The existing software out there was sysadmin stuff because it was mostly DIY. Or required learning to administer and build tooling for another OS.

mike_hearn · on Nov 9, 2022

It's worth noting that Docker was primarily a godsend for people working in scripting languages like Ruby or Python, where they have very messy packaging systems, depend on tons of native Linux libraries and so on.

For people working on the JVM the world was in some sense already 'drenched in light'. You could just send the sysadmin a fat jar you developed on Windows or macOS and tell them to deploy it, done. Or maybe you'd use an app server, so you'd send a less fat jar and they'd deploy it via a GUI and it already gets high level services like db connections, backups, message queues etc.

Also, Docker doesn't really solve the common case of an app that depends on a DB, maybe email etc. Those are services that need administration, you can't just start up a random server and expect things to go well. At least you need backups, proper access control and so on.

So deployment difficulty was very much dependent on what ecosystem you were in.

tinco · on Nov 9, 2022

That's certainly true and describes my circumstances at the time. But it's also good to note that it is great for C/C++ dependencies as well, our GIS applications require PROJ, and our machine learning projects various Nvidia cuda things.

Also, just last week I needed new smbd features, and the only way to deploy a recent smbd version whilst retaining sanity seems on Ubuntu seems to be to just use docker. Normally there's a PPA but there wasn't in this case for some reason.

nunez · on Nov 9, 2022

That's _before_ we consider that most folks aren't running BSD or Solaris, in any capacity, and that jails were never truly ported over to Linux.

_zoltan_ · on Nov 9, 2022

popularity has won every single time in history over technical excellence. I know this is HN and a techbro echo chamber, but technical excellence is not even in the top 3 that people care about.

laumars · on Nov 9, 2022

Exactly my point. People in here are conflating popularity with it being better

nunez · on Nov 9, 2022

> Running Docker on non-Linux platforms requires a Linux VM to run in the background. It's not as cross platform as people make out. The other container technologies can be managed via code too, that code can be shared to public sites. And you can run those containers in a VM too, if you'd want.

right, and this is something else that Docker made incredibly easy to do as well. It's almost transparent now; so much so, that you need to use nsenter to connect to the underlying VM on Mac and Windows.

ajross · on Nov 9, 2022

Jails and Zones are kernel mechanisms, they aren't easy any more than cgroups/namespaces are easy (to be fair, yes, they're easier than linux's tools by default, albeit less flexible). What docker changed was absolutely to make things "easy". A Dockerfile is really no more than a shell script, the docker command line is straightforward, the docker registry is filled with good stuff ready for use.

pjmlp · on Nov 9, 2022

HP-UX vaults, introduced around 1999.

lobotron · on Nov 10, 2022

Yes: "What Docker changed was that it made containers "sexy", likely due to the git-like functionality.

Post cloud era and all those new markets to tap into. Jails etc: like a tool hammering nails. Why, when or how to build a city is something else entirely.

-but, probably git-like sexy functionality indeed. yeeha

nine_k · on Nov 9, 2022

Frankly, you don't need much more for a proof of concept. Beside chroot, you basically need a namespace control utility, an iptables control utility, and a volume control utility.

The "Docker in 100 lines of bash" [1] does just that. It skips the trouble of managing volumes by demanding btrfs; it could instead use stuff like tar / dd and mount to a comparable effect.

This is, of course, about as fiddly, slow, and unreliable as, say, building an amplifier with discrete transistors, or building a clock with nixie tubes and 74xx chips, etc. The point is not in producing a sleek and reliable solution (though this is not ruled out), but in seeing and understanding how a thing works from inside.

[1]: https://news.ycombinator.com/item?id=9925896

gsaussy · on Nov 9, 2022

Also, if you choose not to use the `iptables` command line utility and want to handle network isolation by directly chatting with the kernel over a netlink socket, then you basically have to cargo cult someone else's implementation.

AFAIK the clearest documentation has been reading the Docker networking code.

https://man7.org/linux/man-pages/man7/rtnetlink.7.html

simplotek · on Nov 9, 2022

> (...) but let's not trivialize how insanely easy Docked made creating containers become.

So much this. Docker doesn't get the respect it deserves. I'd also mention packaging/installing/deploying applications, which Docker made so trivial and trouble-free.

Docker at it's core is a masterclass of UX.

nine_k · on Nov 9, 2022

Its UX used to be... slightly suboptimal here and there. Nevertheless, it took the world by storm.

IMHO it's because Docker made the few core things absurdly easy, not just low-friction, but zero-friction. That was key. The rest was important but not critical, and was built out eventually.

xorcist · on Nov 9, 2022

That list are good examples of things Docker makes unnecessarily hard.

Anything else than what Docker provides is out of the question, and what that is is often undocumented and changes between versions. It will also clash with whatever limits and netfilter rules the system uses anyway unless you are very careful.

This is actually one of the things that systemd did well. It's completely straightforward and the man pages are mostly correct. They should have just gone with that instead.

Timon3 · on Nov 9, 2022

Exactly, Docker is great - until you leave the beaten path. At a previous workplace I set up a VPN solution around Wireguard and Docker. We wanted to start off by integrating one physical box into our network, where containers would be spawned with a client and a customized MAC addresses to sort them into individual VLANs, which would then be accessible through the VPN.

It took a number of tries to set up correctly, especially since documentation on these areas mostly consists of reading through various issues on the Docker issue tracker. Some necessary features weren't even supported in the current Docker-Compose file versions. And best of all - 3 clients worked without a problem in parallel, any further clients were not visible in the network. No error logs or anything, just no network.

Of course this wasn't using everyday features, but it would have been nice to have a bit more of an introductory guide into the subsystem. This way it felt like I was fighting against Docker more than it was helping me.

nunez · on Nov 9, 2022

Docker networking complicates a lot of stuff if you're using bridges and overlays, especially WireGuard, which wholly depends on UDP and is insanely unfriendly once NATs come into play. Using --net=host or configuring a macvlan network to give the containers real IPs usually helps.

Also doesn't help that linuxserver/wireguard's docs are basically like "Oh, yeah, if you use this for connecting to your VPN from outside of your house (which, like, 99.95% of people installing WireGuard are trying to do), routing might be an issue and is left as an exercise for the reader."

(Funny enough, Tailscale is to WireGuard what Docker is to containers. They recognized that wg is amazing but amazingly complicated to get going with, especially through NAT/CGNAT, so they drastically simplified that, added amazing UX on top of it, and are raking in that VC cash. Can't wait to read "hurr durr tailscale dumb, wg-quick amirite" in like five years after Tailscale is a multi-billion dollar networking juggernaut)

simplotek · on Nov 9, 2022

> Some necessary features weren't even supported in the current Docker-Compose file versions.

Why docker-compose?

Have you tried Docker swarm mode?

justsomehnguy · on Nov 9, 2022

In my experience swarm is even more difficult to get running if you need something uncommon.

nunez · on Nov 9, 2022

personally i would skip straight to kubernetes if you start messing with swarm

simplotek · on Nov 9, 2022

> personally i would skip straight to kubernetes if you start messing with swarm

I don't know. Docker swarm mode is basically Docker spread over multiple nodes. Consequently it's a very spartan Kubernetes-light, but extremely easy to operate and maintain.

I love microk8s but I still find Docker swarm mode to be the ideal first step for on prem.

nunez · on Nov 10, 2022

the problem with using swarm IMO is forming dependencies around it. i've found that it is easy enough to get started with, but scaling it to any degree is asking for trouble. it also doesn't help that swarm seems all but abandoned now that Docker spends a lot of time bolstering the built-in Kubernetes experience.

tyingq · on Nov 9, 2022

Bocker[1] does a reasonably good job of showing the value of Docker was mostly in Docker hub.

[1] https://github.com/p8952/bocker

goodpoint · on Nov 9, 2022

docker replaced small scripts with a bloated codebase and a big attack surface.

hinkley · on Nov 9, 2022

Docker saved me from clever people who overlapped with the “it works on my machine” crowd.

Regenerating from scratch every time saved a bunch of tribal knowledge from staying locked up in the heads of untrustworthy individuals. You couldn’t “forget” changes you made last Friday that don’t compose with the documented parts of the process.

goodpoint · on Nov 9, 2022

This is like attributing the invention of phones to steve jobs and electric cars to elon musk.

And I wouldn't be too surprised if somebody did it on HN.

hinkley · on Nov 9, 2022

Who invented what doesn’t change when things came together into a coherent solution instead of a differing set of unsatisfying guidelines that nobody follows.

I don’t have this conversation a lot, but I usually find a chef or puppet apologist on the other end of the conversation who doesn’t understand how people who don’t eat, breathe, and sleep system administration issues feel about that flavor of “repeatable”. That shit is crazy and we are never going back.

herodoturtle · on Nov 9, 2022

This article was fun to read. It gives a nice history and overview of chroot.

Irrespective of whether one agrees with the claim in the title, it’s an informative read.

If you’re scanning through these comments, and know very little about chroot, I’d recommend you check this out.

[Edit] The previous linkbaity title was (thankfully) changed by dang.

notpushkin · on Nov 9, 2022

Surprised no one has mentioned Bocker yet – “Docker implemented in around 100 lines of bash”. [1, 2]

[1]: https://github.com/p8952/bocker/

[2]: https://news.ycombinator.com/item?id=33218094 (116 comments)

john-tells-all · on Nov 9, 2022

People tend to think of Docker as big and complex and magic. Bocker lets us take a step back and not take Docker too seriously :)

Someone1234 · on Nov 9, 2022

Wouldn't it be more apt to say they're "chroot delivered via a package management system?"

My big gripe with containers is how insecure many of them are, and how ignored the problem is because it is "someone else's problem." But even I won't pretend that taking a powerful tool like chroot (and or cgroups, etc) and then packaging them for rapid deployment/rollbacks/etc isn't key to their popularity.

asicsp · on Nov 9, 2022

See also:

* https://jvns.ca/blog/2016/10/10/what-even-is-a-container/

* https://jvns.ca/blog/2020/04/27/new-zine-how-containers-work...

mitnk · on Nov 9, 2022

Also:

- https://github.com/saschagrunert/demystifying-containers

- https://ericchiang.github.io/post/containers-from-scratch/

klyrs · on Nov 9, 2022

> I pronounce chroot as change root.

The real hot take

dools · on Nov 9, 2022

I pronounce it like Dwight’s last name from the office

tambourine_man · on Nov 9, 2022

Actually, it’s pronounced “charuto”, for cigar in Portuguese.

kirubakaran · on Nov 9, 2022

Originally from Tamil https://en.wikipedia.org/wiki/Cheroot

pie_flavor · on Nov 9, 2022

Ch as in chord. 'Kroot'.

addaon · on Nov 9, 2022

But "ot" as in depot. "Kruō".

psd1 · on Nov 9, 2022

Ain' dat de chroot

em-bee · on Nov 9, 2022

"I am chroot"

BiteCode_dev · on Nov 9, 2022

I am chroot.

cratermoon · on Nov 9, 2022

> for me, containers are just chrooted processes. Sure, they are more than that: Containers have a nice developer experience, an open-source foundation, and a whole ecosystem of cloud-native companies pushing them forward"

"All right, but apart from the sanitation, the medicine, education, wine, public order, irrigation, roads, a fresh water system, and public health, what have the Romans ever done for us?"

zoobab · on Nov 9, 2022

1979 is the best year!

chroot was written by one of the founders of Sun Microsystems, a cool guy:

https://en.wikipedia.org/wiki/Bill_Joy https://myhero.com/B_Joy_cvhs_cl_US_2016_ul

adamgordonbell · on Nov 9, 2022

Wikipedia does say Bill Joy wrote chroot, but the unix history in the article says that it was DMR or Ken Thompson depending on which file you look at and chroot is listed in the V7 manual. So I think it predates the BSD Kernel.

jlokier · on Nov 9, 2022

The BSD kernel is actually older then Unix V7, so it could plausibly have been added to BSD first.

keithnz · on Nov 9, 2022

I quite enjoyed Liz Rices video on containers... though I'm not quite sure which one I originally watched and she seems to have done the talk quite a few times ( and updated it ). But here's one https://www.youtube.com/watch?v=oSlheqvaRso

whalesalad · on Nov 9, 2022

I like containers, but I absolutely love chroot and debootstrap. It’s killer for when you need isolation (like building software locally without ruining the host OS) but don’t need the pomp and circumstance of containers.

mikepurvis · on Nov 9, 2022

I've long been a fan of basic debootstrap->chroot, though I will say in recent times systemd-nspawn and then buildah have definitely pretty much displaced chroot in my toolbox. They're equally pomp-free relative to Docker, but have a bunch of nice affordances in terms of a properly set up network including DNS and hosts, handling of filesystem permissions, and correctly presenting a read-only /proc.

quickthrower2 · on Nov 9, 2022

If you are trying to tilt me over to always use linux instead of windows for all things you are doing a damn fine job.

My big pain in any OS is messing it up by installing the wrong thing or the wrong way. But at the same time I want to play with different languages.

I mean why use nvm just for node let alone anything else, when I can do this, it feels way cleaner.

whalesalad · on Nov 9, 2022

I’ve been meaning to grok nspawn - for my recent use cases the filesystem was the only level of isolation required but I’m gonna check it out for sure. Glad to hear it’s as good as I’ve heard it is.

bjarneh · on Nov 9, 2022

> StarStruck is just there to check if you are paying attention. It’s not a container runtime at all, but Joe Exotic’s 2nd studio album.

john-tells-all · on Nov 9, 2022

Another take: DevBox provides 60% of the features of Docker while being faster and simpler. https://github.com/jetpack-io/devbox

It provides filesystem isolation -- it uses Nix under the covers. This is great if you're a dev and don't want to mess with routing ports across a hidden VM to the host, just run a command and your service appears directly on the host.

DevBox allows very rapid development. And for production, it exports to Docker. You get the best of all worlds!

john-tells-all · on Nov 9, 2022

Note DevBox is portable -- you don't need Linux! You don't need a VM, which would increase overhead and complexity. DevBox is limited, but very powerful in the features it provides.

crooksey · on Nov 9, 2022

I have been using chroot since approx 04 when I found it during my first Gentoo install, what docker did for me was make it so much easier to deploy various environments, due to how much exposure they have. It took a good amount of effort to make chroot both "cool" and widely available to both sys admins and developers. Sadly what it has done, is make users less aware of the core Linux/Unix principles that allow us to have cool software that just does it all for us.

tambourine_man · on Nov 9, 2022

I've been dreaming of a way of isolating and standardizing installations across Linux and macOS using chroot for a few years. Ideally Windows as well, but I'm not at all familiar with it, although if it's POSIX, it should be possible.

Docker is cool on Linux, but having to virtualize an entire OS everywhere else is just too needlessly slow for situations where you just need to make sure you're running the same stack, not needing to ship the entire machine.

adamgordonbell · on Nov 9, 2022

I'm sure you know this, but OS X has chroot, so you could make a mac native chroot container for something by following the general direction the footnote of this article is pointing at. It would only work on macOS, but it would work.

tambourine_man · on Nov 9, 2022

Yes, I know, that's what I was trying to convey

liquidgecka · on Nov 9, 2022

Uhh.. No.

"Containers" are a mix of two major pieces of technology. One is cgroups, the other is namespaces. Chroot is analogous (kind of) to mount namespaces is all. Containers need not use all of the features of all the cgroups and namespaces available.

For example, from the linux command line I can use the "unshare" command line took to create namespaces to simulate some of the components without the other fanfare.

for example, with namespaces I can remove network access to any process without changing anything else about the system. In this example the process run (ping) has no network devices at all. I love using this to ensure that a script doesn't attempt to download anything unexpected.

$ sudo unshare -n ping 127.0.0.1 ping: connect: Network is unreachable

I can also do things like remove files mounted under another mount. Make a directory called "foo", touch a file in that directory, then mount a ramdisk on top of it and the file is no longer accessible. I can use unshare -m to create a process with a new mount namespace, umount foo and the file becomes accessible again, all while other processes on the system can still see the contents of foo. A real world example of this is when the mysql database got copied onto the root disk before the mysql mount was created. You can leave mysql running while removing the over-mounted files without production risk.

Cgroups are also nifty, you can restrict resources without changing the root file system. So I can ensure that no children of mine can consume more than 1G of RAM without fiddling with file systems at all. Systemd does this with its limits configurations and there are a ton of useful limits that can be applied.

One of the coolest aspects of this is that you can do this per thread, meaning that you can do something like run your render thread in a network namespace without internet access, this preventing possible RCE on just that thread, same for mount namespaces and such, making it possible to chroot effectively a single thread in your application.

So no, containers are not just fancy chroot, they are so very much more.

TheDong · on Nov 9, 2022

> "Containers" are a mix of two major pieces of technology. One is cgroups, the other is namespaces.

I respectfully disagree with that definition of what a "container" is.

cgroups, namespaces, and so on are an implementation detail of how containers are commonly implemented on linux these days.

OpenVZ (https://en.wikipedia.org/wiki/OpenVZ) and Solaris Containers (aka zones) both predate the current linux implementation and did not use namespaces or cgroups.

All three of those ways of running containers are containers. unshare/namespaces/cgroups is just one possible implementation. There are also, certainly, other ways of implementing containers, so it feels wrong to define it in terms of just one specific, admittedly the most popular, implementation.

planede · on Nov 9, 2022

linux namespaces/cgroup may be an implementation detail. But at a high level, each container can have a different view on system resources, and this is the critical point. Different from each other and the host system. The system resource can be filesystem, network access, processes, memory, ... .

chroot only isolates the file system, otherwise all other system resource is shared with the host system.

I don't doubt that other container technologies achieved similar level of isolation or more before Docker. But chroot is really not comparable to Docker.

G3rn0ti · on Nov 9, 2022

Dockers for Windows and Macs do not even use "cgroups" and "namespaces" because these technologies are not available on these stacks -- it resorts to plain old VMs. So in a sense, yes, Docker is not just chroot but on the other hand, it is also not just "cgroups" with "namespaces". It turns out Docker is a reference implementation for the concept of containers. But you can replace it with anything that can process images and a "Dockerfile".

efitz · on Nov 9, 2022

Modern containers are a mix of cgroups, namespaces, pivot_root (like chroot but more appropriate for containers), and seccomp.

Some implementations (e.g. firecracker-containerd) also use SELinux and CPU virtualization support.

"containers are just chroot" is an oversimplification to the point it's misleading.

worthless-trash · on Nov 9, 2022

I think that podman on rhel/fedora also 'uses selinux' to confine containers.

nyanpasu64 · on Nov 9, 2022

Given that threads share an address space, can't the render thread trivially overwrite the call stack of another thread to run attacker-selected functions and parameters?

wahern · on Nov 9, 2022

> Given that threads share an address space, can't the render thread trivially overwrite the call stack of another thread to run attacker-selected functions and parameters?

Yes.

The fact these Linux APIs are per-thread is an artifact of how Linux implements threads, and certainly not a security feature. From a security standpoint, this behavior is more an anti-feature--it's too easy for someone to naively believe they've dropped ambient privileges, unaware that a thread context in the process (a normal process thread, or possibly a io_uring thread or the io_uring context itself) could still possess elevated privileges.

legulere · on Nov 9, 2022

There’s also thread local storage though I don’t know wether the stack sits on it.

wahern · on Nov 9, 2022

Thread-local storage is no less visible to other threads, it's merely a little less convenient to access--you need to figure out the address(es) the same as you would for most any other link in an RCE exploit chain.

itsdrewmiller · on Nov 9, 2022

I think you might be reacting to the title - the article is a lot more measured in its claims, and is a fantastic read.

liquidgecka · on Nov 9, 2022

Perhaps, but his title, and vast over simplification of the technologies he is trying to demystify is doing it all a huge disservice. Its like saying that a "Nuclear Reactor is just a steam turbine with marketing". Chroot was a very specific piece of technology that was nifty in its day, but the entire concept of how the kernel interfaces with processes had to be reworked to make namespaces work. Its not just "minor refinements and marketing" as he states. I appreciate that he is trying to show that "containers" just run in process space, but the approach is just overly simplified with no further explanation of just how simplified it is later on.

dang · on Nov 9, 2022

Ok, we've replaced the linkbaity title with a more representative sentence from the article.

If anyone can suggest a better title (i.e. more accurate and neutral, and using language from the article itself), we can change it again.

adamgordonbell · on Nov 9, 2022

I wrote this and the consistent feedback I've gotten was that people hate the title, so thanks for changing it. I'd pick 'What chroot taught me about containers' but this also works.

dang · on Nov 10, 2022

Even better. Changed from "Let’s build a container runtime using only the chroot system call". Thanks!

nabakin · on Nov 9, 2022

Looks good to me

herodoturtle · on Nov 9, 2022

Thanks! ^_^

jaimehrubiks · on Nov 9, 2022

I'd love a more detailed chapter about this topic. Sounds extremely interesting. Kind of OPs post but with added examples and explanations about croups and namespaces.

Huggernaut · on Nov 9, 2022

In my opinion the series starting with https://medium.com/@teddyking/linux-namespaces-850489d3ccf is the gold standard for introducing namespaces.

jaimehrubiks · on Nov 9, 2022

Thanks!

nunez · on Nov 9, 2022

i definitely agree with you, but I think the point of this exercise is to demonstrate how container images operate (i.e. a rootfs with alterations on top, applied via essentially a recursive untar)

patrick451 · on Nov 9, 2022

So I was trying to follow along with this and kept getting an error

    /# chroot /testroot /hello
    chroot: failed to run command ‘/hello’: No such file or directory

If anybody was wondering, the trick is to statically link

     gcc hello.c -o hello  -static -static-libgcc

Or, use go for the hello executable

adamgordonbell · on Nov 9, 2022

Author here. Sorry, I should have added more instruction there.

What I was expecting someone to do is use the 'pull' command made later to extract a statically linked hello from the hello:latest image, but I didn't explain how until later. So yes, thanks for including this, static linked is the way to go here, or ldd and copying other things in.

BeefWellington · on Nov 9, 2022

Or dynamically link and use ldd to identify and copy libraries the program is linking:

   ldd hello

chriswarbo · on Nov 9, 2022

This is why I build containers using Nix (plus the reproducibility, caching, etc.): since everything has a path containing an unguessable hash, all dependencies must be referred to by their absolute path (either directly, or indirectly via an env var, etc.).

That makes it trivial to find all the dependencies of a file (whether they're dynamic libraries, interpreter packages, image files, etc.), transitively, and tar them all up into a self-contained container :)

efficax · on Nov 9, 2022

containers are cgroups and namespaces and are much more than just chroots, plus a set of tools for making those work

StopHammoTime · on Nov 9, 2022

I think the point is that it's not magic, but a very clear and obvious system that's been created using native tools. I've run into a lot of people (including Senior SRE/DevOps people) who don't realise that Containers are just another process running on a system.

liquidgecka · on Nov 9, 2022

Sure, but his explanation of that point is vastly over simplifying what is happening. `Containers are chroot with a Marketing Budget` implies that they are basically the same when in fact the technology behind containers more or less completely rewrote the way that the kernel thinks of processes in order to work properly.

Rather than trying to associate containers with chroots, it would have been way better to go the other way. You ssh terminal? That is effectively a container. Xorg? Yea that's a container too and for convince they share the same resources. There is absolutely nothing that says that the desktop need run in the same namespace as init or that two users logged into the same machine need have the same view of the file system and networking setup. `ls /` from one user can return a completely different set than `ls /` from another and all of this is because containers are just processes, and processes are way more than just a pid in a table.

pdntspa · on Nov 9, 2022

If that is how a system behaved, I, as a user, would pull my hair out.

"How can you not see that file? It's right there!"

It'd be a great way to cement job security in IT

liquidgecka · on Nov 9, 2022

To be fair I am right there with you, I was just pointing out how nifty they can be conceptually.

How is this for a real world example, that wifi interface that you configured via the GUI and clicked "do not share with other users" exists exclusively in your users namespace, ifconfig only shows it to you, other users simply don't see it and have no access to its routes? Is that a saner example of how this implementation can be used?

monocasa · on Nov 9, 2022

Practically, it can be done in a sane way like with Plan9's ubiquitous, per process vfs namespaces.

pushedx · on Nov 9, 2022

A container is not a process.

It’s an execution environment created by namespaces (mount, pid, network, others) and cgroups.

A container can house many processes.

liquidgecka · on Nov 9, 2022

^-- this!

Also, a container can house a container.. kind of. If you think of containers like file system inodes then there is a logical view of containers container1: (init -> dockerd) -> container2: (init -> dockerd) -> container3: (init -> dockerd) -> container4: (process) is perfectly feasible and from our view there is a hierarchy of containers, but in reality it would be possible assign a process in container4 to container1 and create all kinds of cool logical loops. This is effectively what a daemonset is in kubernetes. =)

lstodd · on Nov 9, 2022

In reality there is no such thing as a 'container'.

You can create an incredible self-intersecting mess of process namespaces.

It is not what you want to do in production.

Also, docker is an overengineered mess. Kubernetes only an couple of orders of magnitude more so.

G3rn0ti · on Nov 9, 2022

On Windows and Macs Docker uses VMs ...

I think the DEV community really needs a new definition of "containers" and one that is not backed by implementation details of Docker Linux.

c-baby · on Nov 9, 2022

When running, containers aren’t anything more than composed process isolation primitives like chroot. Docker’s main innovation was the layering system, which made it feasible to build and distribute the large images required to ship an entire system in a container.

lamontcg · on Nov 9, 2022

They're "RPMs" with some union mounted fanciness, a chroot, and some cgroups isolation to keep other people's fork bombs from taking out your service.

Agreed that explainng containers should never start with VMs.

I think it should start with package managers rather than chroots though.

chriswarbo · on Nov 9, 2022

I think .rpm/.deb is a bad analogy, since those (a) get installed into a global namespace (the / folder), (b) rely on the existence of other packages in that namespace (their dependencies) and (c) require global consistency between all the installed packages.

You can avoid those global consistency problems by installing RPM/Deb packages into a chroot (or a "proper" container), but that ceases to be a good analogy of what containers are.

lamontcg · on Nov 9, 2022

Yeah I've spent like a decade building rpms that packaged all their deps inside of them, so I very much don't find the dependency management between them to be fundamental, which might not be intuitively obvious to everyone else.

Maybe it'd be better to call them a tarball, but there is the way that you can install it and run it and then nuke it cleanly which you don't get out of the box with a tarball, which is why I like the package metaphor slightly better.

EDIT: the fact that it is in a chroot, though, I think sort of covers the way that it doesn't really interact with the external filesystem and installed packages. That implies that you have to ship all the deps with the package itself and those usually would be installed into something like /chroot/<whatever> and never into e.g. /usr/bin and the nature of the chroot would make everything in /usr/bin inaccessible. If you build an RPM which installs into a chroot you're really not going to be able to have any dependencies on anything else in the system. The tweaks to make that separation slightly nicer are fairly small tweaks overall. That might not be intuitively obvious though.

chriswarbo · on Nov 10, 2022

> Maybe it'd be better to call them a tarball, but there is the way that you can install it and run it and then nuke it cleanly which you don't get out of the box with a tarball, which is why I like the package metaphor slightly better.

Containers are tarballs, e.g. the rootfs for runc, the layers of an image, and images themselves are usually tarballs. The extra stuff "you don't get out of the box with a tarball" are precisely the tools which we call "containers" (from low-level stuff like runc, to high-level things like online image registries)

its-summertime · on Nov 9, 2022

chroot's only use is to make / point somewhere else, it doesn't restrict file system access except during perfect weather conditions, and even then.

https://man7.org/linux/man-pages/man2/chroot.2.html

> This call changes an ingredient in the pathname resolution process and does nothing else. In particular, it is not intended to be used for any kind of security purpose, neither to fully sandbox a process nor to restrict filesystem system calls.

Actual restrictions are done with other methods, chroot is just to make things look nice inside the container

cpuguy83 · on Nov 9, 2022

Runtimes typically use pivot_root and not chroot as well.

witnesser2 · on Nov 9, 2022

This is not my area. Just audited Distributed System. This Docker thing, from beginning, sounds like try to implement a middleware using software. I don't recall but obscurely remember the distribute system guys has something about implementing a middleware. If the kernel is shared, it is not a container. It is like everything laid on the floor on crowded ferry. I would like to know if there is a specifiction of Docker. Particularly its Qos -- quality of service. That is the major consideration of any engineering.

-- opinion

witnesser2 · on Nov 9, 2022

I give a QOS example like a modem 24.4 kbps.

Then for docker as middleware, eg throttle ticks per container per time frame etc.

bitwize · on Nov 9, 2022

Yeah, more or less, except containers also have a smoother UX and processes within a container are more successfully tricked into believing they're on a separate machine.

moss2 · on Nov 9, 2022

Wow this explains so well what Docker is. Fancy building a $1 billion company around chroot.

palata · on Nov 9, 2022

Don't check what Slack (or WhatsApp before) is, then.

blt · on Nov 9, 2022

tangent: in github.com/dspinellis/unix-history-repo linked from the article, the default branch is shown to have "∞ commits". Any ideas where this bug comes from? Something related to Unix epoch?

evilc00kie · on Nov 9, 2022

> Something related to Unix epoch?

Nope, I just tested it.

I'd guess there are too many branches with too many commits (e.g. all those FreeBSD ones with 10k-200k commits). Although you can click on branch and see the count of commits per branch, there might be an issue showing this information on the branch view due to some "DB-query-edge-case".

You know, it doesn't seem to be that hard to solve this problem in the first place, but backends with their DBs and services can be a wild jungle of data correlation.

blt · on Nov 9, 2022

What was your test? Do you mean the problem is in GitHub, not git? I suppose GitHub caches a lot of git's output in DBs to improve response time, sounds like fertile ground for bugs.

kramerger · on Nov 9, 2022

Didn't someone do this with the real container "API" and the code was not that much longer?.

IIRC, she also used Go and it was more of a live coding talk than a blog post.

aabbccde · on Nov 9, 2022

Are you talking about this talk from Liz Rice titled "Containers from Scratch"[1]

[1]: https://www.youtube.com/watch?v=oSlheqvaRso

kramerger · on Nov 9, 2022

That's the one, thanks!

latenightcoding · on Nov 9, 2022

Related: https://github.com/kazuho/jailing

ge96 · on Nov 9, 2022

Oh that's cool he does the Corecursive podcast.

froh · on Nov 9, 2022

meanwhile the chroot command line utility has been dropped from posix 2017:

https://pubs.opengroup.org/onlinepubs/9699919799/xrat/V4_xcu...

artemonster · on Nov 9, 2022

There is junest that does all this nicely

liquid153 · on Nov 9, 2022

Can someone link me to the I think google vid of what a container is

tamrix · on Nov 9, 2022

Or freebsd jails.

aaaaaalexas312 · on Nov 9, 2022

exabrial · on Nov 9, 2022

I think the only correct way to do containers is… groan… how systemd does it.[ One should be able to launch a process and not let it touch anything else but it’s very process definition..]

Thank you for your down vote. I look forward to being moderated out of existence.

sargun · on Nov 9, 2022

Perhaps a more productive comment (and I honestly am curious), would be -- what does systemd do right?

XorNot · on Nov 9, 2022

Systemd is containers, is the thing. Its entirely built around using cgroups to manage process resources, it just starts from the normal "I'm running root processes" level because it was intended to init a whole system.

The one missing component was the concept of union mounting layered file systems from a repository.

jsmith45 · on Nov 9, 2022

The systemd-nspawn system (which is how running more more typical containers is normally implemented) is a little unusual because it it geared towards running os containers, including running their whole init system (via --boot or the template unit file). There is a reason why machinectl calls the input machine images, and both accept raw disk images. This is also reflected in the systemd-nspawn requirement of having an os-release file in the container. Of course, it is entirely capable of running application containers (hence features like --as-pid2), but some of the defaults like registering with systemd-machined don't make as much sense in those scenarios.

Docker is nominally geared towards application containers, although it mostly works with OS containers. Indeed it seems like container images that contain a full OS (minus kernel) are probably more common than not.

exabrial · on Nov 9, 2022

Systemd is a bit of a joke in some areas: binary logs for instance. So praising systemd at all often invokes scorn. That being said, ‘containing stuff’ is one area it actually does really well.

Two other excellent comments have covered the meat and potatoes of it, so I will just add a few things because they are spot on.

What does right is it allows you to very specifically set ports allowed, cpu/mem limits, filesystem access limits, read-only paths, read/write Paths, chroot, private devices, private network, and many more for any single generic program: you can contain anything. We add authbind to the mix to allow for binding to lower ports. All of the spawned processes are visible from the host system using regular normal unix tools.

What does even better than Docker is _not_ downloading third party code/programs from a repository. This is a much smaller footprint and allows for much easier inspection. You _can_ if you want, use systemd-nspawn which will give you a completely private deb-bootstrap, sorta mirroring what Docker does, but in my opinion that’s not necessary and increases the amount of things you need to update.

xorcist · on Nov 9, 2022

It is much more straightforward to use and with less surprises. The limits actually do what they say the do.

It also takes care of process lifecycle. And logs by default. It's preferable in every way.

What it doesn't do is manipulate and transfer file system images, that's outside the scope. Which is probably the right design. The whole thing of having Docker manipulate images for you was a strange decision and makes CI/CD unnecessarily complicated.

magicalhippo · on Nov 9, 2022

Any examples of this?

I got an application, I want to specify where the internal /etc/foo and /data lives outside the "systemd-container" and I want to map some ports to the internal IP.

How do I set this up with systemd?

xorcist · on Nov 9, 2022

There is plenty of examples in the documentation. The man page (man systemd-nspawn) is short and to the point.

As for your example,for regular daemons you probably want to use plain old unit files. You map ports with "Port=" and bind filesystems with "Bind=" ("/outside/data:/data" in your example). Again, plenty of examples in the documentation and in the Gentoo/Arch wikis which are excellent.

It also includes examples working with (not against) SELinux. Lots of other parts of the systemd family is nasty and full of surprises but this one pretty straightforward. systemd was built on cgroups from the ground up and it makes administering them fairly trivial.

magicalhippo · on Nov 9, 2022

Thanks, hadn't heard about systemd-nspawn. Arch wiki to the rescue again, as you say.

exabrial · on Nov 9, 2022

I post a gist for you. This is filtered through our maven build where variables are substituted, so you'll see a bunch of placeholders that get replaced during compilation: https://gist.github.com/exabrial/fbda5ebceb94ae2397aac16f586...

This particular template we're not really worried about mem/cpu limits so that isn't in there. Authbind is in there to allow it to grab lower ports, but this particular app does not.

I would recommend reading these as well:

* https://gist.github.com/ageis/f5595e59b1cddb1513d1b425a323db...

* https://www.opensourcerers.org/2022/04/25/optimizing-a-syste...

magicalhippo · on Nov 9, 2022

Thanks, that's really helpful.

alpb · on Nov 9, 2022

tl;dr the author uses "docker export" to squash a container image layers into a rootfs, and uses chroot to make it the root of that filesystem and runs a binary there.

This practically has nothing to do with the term "containers". Author says "Others will tell you containers are about namespaces [...] But for me, containers are just chrooted processes. " , however containers are very much about isolation, resource controls, and various network features like port mappings for many people.

So you really didn't build anything even remotely resembling a container runtime, you redefined the term and built something that does "docker export | tar zxvf && cd /dir && chroot /dir && /prog". :)

nunez · on Nov 9, 2022

tbf, he could have also built the rootfs from source, though that takes a lot longer