An terbative to removing files or go through contortions to stuff things in a si...

orf · on Jan 26, 2023

Unfortunately this messes with caching and causes the builder step to always rebuild if you’re using the default inline cache, until registries start supporting cache manifests.

vidarh · on Jan 26, 2023

How so? I just tested a build, and it used the cache for every layer including the builder layers.

orf · on Jan 26, 2023

You did this on the same machine, right? In a CI setting with no shared cache you need to rely on an OCI cache. The last build image is cached with the inline cache, but prior images are not

korijn · on Jan 26, 2023

You can build the first stage separately as a first step using `--target` and store the cache that way. No problem.

orf · on Jan 26, 2023

How would you do this in a generic, reusable way company-wide for any Dockerfile? Given that you don't know the targets beforehand, the names, or even the number of stages.

It is of course possible to do for a single project with a bit of effort: build each stage with a remote OCI cache source, push the cash there after. But... that sucks.

What you want is the `max` cache type in buildkit[1]. Except... not much supports that yet. The native S3 cache would also be good once it stabalizes.

1. https://github.com/moby/buildkit#export-cache

korijn · on Jan 27, 2023

Standardize the stages and names. We use `dev` and `latest`.

It worked wonders for us, on a cache hit the build time is reduced from 10 to 1,5 minutes.

vidarh · on Jan 26, 2023

Ah, sorry I misunderstood you. Yes, I don't tend to care about whether or not the steps are cached in my CI setups as most of the Docker containers I work on build fast enough that it doesn't really matter to me, but that will of course matter for some.

claytonjy · on Jan 26, 2023

I never got around to implementing it but I wonder how this plays with cross-runner caches in e.g. Gitlab, where the cache goes to S3; there's a cost to pulling the cache, so it'll never be as fast as same-machine, but should be way faster for most builds, right?

maaarghk · on Jan 27, 2023

the cache is small but if you have a `docker buildx build --cache-from --push` type command it will always pull the image at the end and try to push it again (although it'll get layer already exists responses), for ~250mb images on gitlab I find this do-nothing job takes about 2.5 mins in total (vs a 10 min build if the entire cache were to be invalidated by a new base image version). I'd very much like it if I could say "if the entire build was cached don't bother pulling it at the end", maybe buildkit is the tool for that job

claytonjy · on Jan 27, 2023

I mostly love Docker and container-based CI but wow what a great reminder that even common-seeming workflows still have plenty of sharp edges!

orf · on Jan 26, 2023

If you're running the CI inside AWS, and assuming the code isn't doing anything stupid, it will be fast enough for nobody to notice.

sequoia · on Jan 26, 2023

A tiny bit more on this topic: https://sequoia.makes.software/reducing-docker-image-size-pa...

iyn · on Jan 27, 2023

Thanks for sharing, very useful blog post (not just the linked section). Reference to https://github.com/wagoodman/dive will help a lot today.

d3nj4l · on Jan 26, 2023

This isn’t really useful for frameworks like rails, since there’s nothing to “compile” there. Most rails docker images will just include the runtime and a few C dependencies, which you need to run the app.

bdcravens · on Jan 26, 2023

Pulling down gems is a bit of a compilation, which could benefit, unless you're already installing gems into a volume you include in the Docker container via docker compose etc. Additionally, what it does compile can be fairly slow (like nokogiri).

vidarh · on Jan 26, 2023

There are however temporary files being downloaded for the apt installation, and while in this case it's simple enough to remove them in one step that's by no means always the case. Depending on which gems you decide to rely on you may e.g. also end up with a full toolchain to build extensions and the like, so knowing the mechanism is worthwhile.

theptip · on Jan 26, 2023

How would you go about copying something you installed from apt in a build container?

Say `apt install build-essential libvips` from the OP, it's not obvious to me what files libvps is adding. I suppose there's probably an incantation for that? What about something that installs a binary? Seems like a pain to chase down everything that's arbitrarily touched by an apt install, am I missing some tooling that would tame that pain?

vidarh · on Jan 26, 2023

It's a pain, hence for apt as long as it's the same packages, just cleaning is probably fine. But e.g. build-essential is there to handle building extensions pulled in by gems, and that isn't necessary in the actual container if you bring over the files built and/or installed by rubgygems, so the set of packages can be quite different.

viraptor · on Jan 26, 2023

Run "dpkg -L libvips" to find the files belonging to that package. This doesn't cover what's changed in post install hooks, but for most docker-relevant things, it's good enough.