Seems to be a pretty decent overview; covers the usual suspects (multi-stage builds, FROM scratch, non-scratch minmal images, ldd to check libraries), with some nice bits that I'd not seen before (busybox:glibc). I would be curious to see how these base images stack up against Google's "distroless" base images (https://github.com/GoogleContainerTools/distroless). I also appreciate that they call out Alpine's compatibility issues (on account of musl) but still leave it as a thing that can be good if you use it right. (Personally I'm quite fond of Alpine, but I don't bother when using binaries that expect glibc.)
One important thing missing from BusyBox:glibc is the SSL certificates that's almost always needed if you are writing things like crawler etc.
As for Alpine, at least for packaging Python code, avoid as much as possible. It cannot reuse manylinux packages on PyPI and have to recompile C modules.
With Alpine, the size advantages of musl sometimes come with hidden overheads as well.
Musl's malloc() seems to be a fair bit slower than glibc's when you start to throw in more threads at it (fragments faster?).
And then if you try to replace that with jemalloc, you'll quickly find out that jemalloc does not actually get used.
At some level, the difference between a 1Mb docker image and a 9Mb one isn't that significant (& I asked for some things like strace, ping and nslookup to be in it, because it gets impossible to debug the container when it obviously caches a DNS record or at least, looks like it is doing that).
been using docker-slim[1] for a couple years and am really happy with it.
Also I build everything from source and try to avoid glibc wherever I can, use hardened malloc and -D_FORTIFY_SOURCE=2 everything. also use static linking wherever I can and try to --disable-foo with configure if I can get away with it. Using a minimalistic approach that restricts everything (down to a syscall whitelist) sounds like it's high maintenence but when done in combination with a catalog of what runs in these containers and some plumbing to poll upstream for repo updates allows me to ignore a lot of what affects most people wrt security. As a bonus I learn about it quickly if upstream introduces a major change and I'm forced to study the source and get to decide if they were "out of their mind again" with these changes.
Keeps me abreast of how my platform behaves and when it doesn't as it should (it's kind of a forward investment for efficient root cause analysis that beats "have you tried turning it off and on again")
Docker imo doesn't deserve the hype. Most of the flaws I deal with in my setup are due to docker instability (often in docker swarm) ... it's the next thing I actually want to replace because it's just overhyped "static linking for Millennials".
While such articles are usually helpful, I'd caution that making individual image sizes as small as possible shouldn't really be your goal.
As a simple example - if you have 100 unique images in your system, having an image size of 1 GB each where 99% of it is derived from a common layer is going to be a lot smaller overall than "optimizing" the size down to 100 MB each but taking away the base layers.
In my experience you 100 unique images might still have different base images. So you end up with too many layers. Especially with interpreted languages those python, ruby or node runtimes add up quickly
Also depending on usecase (e.g running a CI system) you might end up with high storage costs (caching) or high bandwidth costs (no caching)
Depending where your datacenter is, coworkers with slow connections might suffer as well. Hey, let’s just download 12GB of my everylanguage-and-Tex:latest every workday.
But I agree, docker golfing to have minimal images is not always needed, but don’t let your images grow like a wildfire. They are part of your architecture.
You don't have to give up public images. Here's a simple example rule - production images should all be built on Ubuntu:18.04. (With a large enough maintenance team this can change to - you must use one of these 8 images with the standard runtime stacks preconfigured).
Teams still have the flexibility of using whatever they want during the build stage, and it takes away the ambiguity of them having to decide between different OS flavors, versions, slim/Alpine, building from scratch or whatever else. In 99% of cases there will be no functional difference between them anyways, and this rule reduces the overhead of having to track down and manage an infinite number of permutations from an operations side.
Using a trendy small base image doesn't make the intermediate layers any smaller.
What you're describing is what cloud native buildpacks are intended to solve: sensible, efficient, auto-updateable OCI images. I worked on them for a while and it just hardened my heart against Dockerfiles.
It's starting to feel like this is a lost battle though. People have decided that Dockerfiles are "the kubernetes way", even though it exposes developers to details they just shouldn't need to care about.
Add anything on top, and it is no longer "vanilla kubernetes". Never mind that means everyone roles their own Rube Goldberg build process, and somehow that is ok...
I don't feel like it's a lost battle at all. Folks are fanning out and learning that all of this is harder than it looks, with the exception of enterprise, where it's harder than it looks squared.
I suspect folks will become attracted to the simplicity and performance of buildpacks. They make whole classes of headache vanish.
It's not just the raw size of the image, but also about what the image includes; a smaller image often reduces the potential attack surface because vulnerable things just aren't there.
That's one of the major rationales behind the distroless images. Being space optimized is just a really nice side effect.
>> a smaller image often reduces the potential attack surface because vulnerable things just aren't there
By the way, the article proposes blind download of artifacts from someplace on the internet, on every build. Not only that can cripple your builds when the source is down (which happens all the time), it can (and that has happened) send you arbitrary infected crap instead of what you wanted.
There are plenty of resource limited use cases, where storage is not exactly cheap. Or updating image over network might be slow or expensive (think edge, over 3-4g)
It's worth noting that golang builds can be smaller than that with `GOOS=linux go build -ldflags="-s -w" .` (assuming a build on macos for linux.) From there I usually run `upx --ultra-brute -9 program` before dropping it into a `scratch` docker container (plus whatever other deps it needs).
They're probably not best at any of portability, isolation, or security (they probably lose to full VMs on all counts). But they're good enough and way more convenient / easier to use.