> It's a tradeoff between making container images reproducible, and not shipping security vulnerabilities.
You can regenerate your base images every day or more often and have consistent containers created from an image. Freshly generated image can be tested in a pipeline to avoid issues and you won't hit issues like inability to scale due to misbehaving new containers.
> You can regenerate your base images every day or more often and have consistent containers created from an image.
That solves nothing, as it just moves the unreproducibility to a base image at the cost of extra complexity. Arguably that can even make the problem worse as you just add a delta between updates where there is none if you just run apt get upgrade.
> Freshly generated image can be tested in a pipeline to avoid issues and you won't hit issues like inability to scale due to misbehaving new containers.
You already get that from container images you build after running apt get upgrade.
`apt` runs during the creation of 1-3 VM images per architecture and not during creation of dozens of container images based on each VM image.
When we have VM images upon which all our usual Docker images were successfully built, we trust it more than `FROM busybox/alpine/ubuntu` with following Docker builds. I've detailed the process in a neighboring comment[1] but you're right that it doesn't suit all workflows.
For AMIs (and other VM images) it might make more sense. With containers? Not so much. And with a distributed socket image caching layer it makes even less sense.
We have a maximum image age of 60 days at work. You gotta rebase at a minimum of 60 days or when something blows up. Keeps everyone honest and honestly not that bad. New sprint new image then promotion. And with a container repository and it being internal does reproducibility really matter? Just pull an older version if push comes to shove.
I don't know (I know) why people aren't moving to platforms like lambda to avoid NIH-ing system security patching operations. We can still run mini monoliths without massive architectural change if we don't get too distracted by FaaS microservice hype
When your workloads are unpredictable and spike suddenly such that you can't scale quickly enough to avoid having a bunch of spare capacity waiting around and have HA requirements. In this scenario more is spent on avoiding variable spend to achieve a "flat" rate
In 20 years of writing software, I have never seen an amount of legitimate influx of traffic that can swamp a whole pool of servers faster than it can scale. I’m not saying it can’t happen, I’ve just not worked on any code or infrastructure that couldn’t keep up with the demands of scale. Is there an industry this regularly happens in where this is a recurring issue?
I write software that a billion users see every day, so maybe I’m jaded by the sheer scale and challenges of writing code at scale that I just can’t imagine these types of problems.
You are looking at your own experiences I guess. In edtech it is common for large classrooms to suddenly come online and do things in tight coordination and no predictive scaling isn’t predictable enough for this problem. You can also look at ecommerce, Black Friday type events to see how capacity planning can easily require runway on spare capacity before scaling can react several minutes in.
Do you think EC2 capacity on AWS is on average kept in high utilization? Everyone runs (non truly elastic resources) with headroom to varying degrees
Ah, yeah, I’m only familiar with the industries I’ve worked in and never worked in edtech. That’s a pretty good example of any industry that gets sudden, unpredictable load.
> I don't know (I know) why people aren't moving to platforms like lambda to avoid NIH-ing system security patching operations.
Perhaps because people do their homework and just by reading the sales brochure they understand that lambdas are only cost-effective as handlers of low-frequency events, and they drag in extra costs by requiring support services to handle basic features like logging, tracing, and even handling basic http requests.
Predictability has nothing to do with it. Volume is the key factor, specially its impact on cost.
> arrogant of you to say adopters haven’t done their homework
Those who mindlessly advocate lambdas as a blanket solution quite clearly didn't even read the marketing brochure. Otherwise they would be quite aware of how absurd their suggestion is.
Basically you recreate your personal base image (with the apt-get commands) every X days, so you have the latest security patches. And then you use the latest of those base images for your application. That way you have a completely reproducible docker image (since you know which base image was used) without skipping on the security aspect.
> Basically you recreate your personal base image (with the apt-get commands) every X days, so you have the latest security patches.
How exactly does that a) assure reproducibility if you use a custom unreproducible base image, b) improve your security over daily builds with container images built by running apt get upgrade?
In the end that just needlessly adds complexity for the sake of it, to arrive at a system that's neither reproducible nor equally secure.
If I build an image using the Dockerfile in the blog post 10 days later, there is no guarantee that my application would work. The packages in Ubuntu's repositories might be updated to new versions that are buggy/no longer compatible with my application.
OP's suggestion is to build a separate image with required packages, tag it with something like "mybaseimage:25032022" and use it as my base image in the Dockerfile. This way, no matter when I rebuild the Dockerfile, my application will always work. You can rebuild the base image and application's image every X days to apply security patches and such. This also means I now have to maintain two images instead of one.
Another option is to use an image tag like "ubuntu:impish-20220316" (instead of "ubuntu:21.10") as base image and pin the versions of the packages you are installing via apt.
I personally don't do this since core packages in Ubuntu's repositories rarely introduce breaking changes in the same version. Of course, this depends on package maintainers, so YYMV.
Whether you have a separate base or not, it relies on you keeping an old image.
The advantage a separate base has is allowing you to continue to update your code on top of it, even while the new bases are broken.
You could still do that without it though, just by forking out of the single image at the appropriate layer. Not as easy, but how often does it happen?
> If I build an image using the Dockerfile in the blog post 10 days later (...)
To start off, if you intend to run the same container image for 10 days straight, you have far more pressing problems than reproducibility.
Personally I know of zero professional projects whose production CICD pipeline don't deploy multiple times per day, or in the very worst case weekly in very rare cases where there is zero commit.
> OP's suggestion is to build a separate image with required packages, tag it with something like "mybaseimage:25032022" and use it as my base image in the Dockerfile.
Again, that adds absolutely nothing to just pulling the latest base image, running apt-get upgrade, and tagging/adding metadata.
Eh, that’s a heavy handed and not great way of ensuring reproducibility.
The smart way of doing it would be to:
1. Use the direct SHA reference to the upstream “Ubuntu” image you want.
2. Have a system (Dependabot, renovate) to update that periodically
3. When building, use “cache from” and “cache to” to push the image cache somewhere you can access
And… that’s it. You’ll be able to rebuild any image that is still cached in your cache registry. Just re-use a older upstream Ubuntu SHA reference and change some code, and the apt commands will be cached.
I'm applying security patches, necessary updates and similar during system image creation (VM image - for example AWS AMI - the one later referred in Dockerfile's FROM). Hashicorp's Packer[1] comes in handy. System images are built and later tested in an automated fashion with no human involvement.
Testing phase involves building Docker image from fresh system image, creating container(s) from new Docker image and testing resulting systems, applications and services. If everything goes well, the system image (not Docker image) replaces previously used system image (one without current security patches).
We have somewhat dynamic and frequent Docker images creation. Subsequent builds based on the same system image are consistent and don't cause problems like inability to scale. Docker does not mess with the system prepared by Packer - doesn't run apt, download from 3rd party remote hosts but only issues commands resulting in consistent results.
This way we no longer have issues like inability to scale using new Docker images and humans are rarely bothered outside testing phase issues. No problems with containers though, as no untested stuff is pushed to registries.
I mean, HN is the land of "offload this to a SaaS" and when we can actually offload something to a distro, like "guarantee that an upgrade in the same distro version is just security patches and won't break anything", it is recommended to avoid doing it?
Security assfarts will yell at you for either approach. It'll just be different breeds yelling at you depending which route you go, and which one most recently bit people on the ass.
It's a tradeoff between making container images reproducible, and not shipping security vulnerabilities.
People tend to prefer the latter.
Furthermore, you can exec your way into a container and check exactly which package version you installed.