Hacker News new | past | comments | ask | show | jobs | submit login

An terbative to removing files or go through contortions to stuff things in a single layer is to use a builder image and copy the generated artefacts into a clean image:

    FROM foo AS builder

    .. build steps

    FROM foo

    COPY --from=builder generated-file target

(I hope I got that right; on a phone and been a while since I did this from scratch, but you get the overall point)



Unfortunately this messes with caching and causes the builder step to always rebuild if you’re using the default inline cache, until registries start supporting cache manifests.


How so? I just tested a build, and it used the cache for every layer including the builder layers.


You did this on the same machine, right? In a CI setting with no shared cache you need to rely on an OCI cache. The last build image is cached with the inline cache, but prior images are not


You can build the first stage separately as a first step using `--target` and store the cache that way. No problem.


How would you do this in a generic, reusable way company-wide for any Dockerfile? Given that you don't know the targets beforehand, the names, or even the number of stages.

It is of course possible to do for a single project with a bit of effort: build each stage with a remote OCI cache source, push the cash there after. But... that sucks.

What you want is the `max` cache type in buildkit[1]. Except... not much supports that yet. The native S3 cache would also be good once it stabalizes.

1. https://github.com/moby/buildkit#export-cache


Standardize the stages and names. We use `dev` and `latest`.

It worked wonders for us, on a cache hit the build time is reduced from 10 to 1,5 minutes.


Ah, sorry I misunderstood you. Yes, I don't tend to care about whether or not the steps are cached in my CI setups as most of the Docker containers I work on build fast enough that it doesn't really matter to me, but that will of course matter for some.


I never got around to implementing it but I wonder how this plays with cross-runner caches in e.g. Gitlab, where the cache goes to S3; there's a cost to pulling the cache, so it'll never be as fast as same-machine, but should be way faster for most builds, right?


the cache is small but if you have a `docker buildx build --cache-from --push` type command it will always pull the image at the end and try to push it again (although it'll get layer already exists responses), for ~250mb images on gitlab I find this do-nothing job takes about 2.5 mins in total (vs a 10 min build if the entire cache were to be invalidated by a new base image version). I'd very much like it if I could say "if the entire build was cached don't bother pulling it at the end", maybe buildkit is the tool for that job


I mostly love Docker and container-based CI but wow what a great reminder that even common-seeming workflows still have plenty of sharp edges!


If you're running the CI inside AWS, and assuming the code isn't doing anything stupid, it will be fast enough for nobody to notice.



Thanks for sharing, very useful blog post (not just the linked section). Reference to https://github.com/wagoodman/dive will help a lot today.


This isn’t really useful for frameworks like rails, since there’s nothing to “compile” there. Most rails docker images will just include the runtime and a few C dependencies, which you need to run the app.


Pulling down gems is a bit of a compilation, which could benefit, unless you're already installing gems into a volume you include in the Docker container via docker compose etc. Additionally, what it does compile can be fairly slow (like nokogiri).


There are however temporary files being downloaded for the apt installation, and while in this case it's simple enough to remove them in one step that's by no means always the case. Depending on which gems you decide to rely on you may e.g. also end up with a full toolchain to build extensions and the like, so knowing the mechanism is worthwhile.


How would you go about copying something you installed from apt in a build container?

Say `apt install build-essential libvips` from the OP, it's not obvious to me what files libvps is adding. I suppose there's probably an incantation for that? What about something that installs a binary? Seems like a pain to chase down everything that's arbitrarily touched by an apt install, am I missing some tooling that would tame that pain?


It's a pain, hence for apt as long as it's the same packages, just cleaning is probably fine. But e.g. build-essential is there to handle building extensions pulled in by gems, and that isn't necessary in the actual container if you bring over the files built and/or installed by rubgygems, so the set of packages can be quite different.


Run "dpkg -L libvips" to find the files belonging to that package. This doesn't cover what's changed in post install hooks, but for most docker-relevant things, it's good enough.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: