There are a lot of reasons to prefer reproducible builds, and many of them are not security related... It seems a bit presumptuous to argue that noone needs reproducible builds because one particular security argument is flawed.
First, a non-flawed security argument: it only takes one non-malicious person to build a package from source and find that it doesn't match the distributed binary to spot a problem. Sure, if you don't compile the binaries yourself, you might not find out until later that a binary was compromised, but that's still better than never finding out. The reality is that most people don't want to spend time building all their packages from source...
More generally, reproducible builds make build artifacts a pure function of their inputs. There are countless reasons why this might be desirable.
- If a binary is lost, it can be rebuilt exactly as it was. You only need to ensure the source is preserved.
- If a particular version of the code is tested, and the binary is not a pure function of the code, then you haven't really tested the binary. Bugs could still be introduced that were not caught during testing because your build is non-deterministic.
- It provides a foundation for your entire OS image to be built deterministically.
- If you use a build cache, intermediate artifacts can be cached more easily, and use less space. For example, changing the code from A -> B -> A will result in two distinct artifacts instead of three.
If the new version is busted, I want to rebuild the same output only changing the compiler, or package manager, or one of the library upgrades, and see what happens.
If I can't reproduce last tuesday then how do I get back to Friday when shit wasn't on fire?
This also goes the other way too. Repeatability is daylight that removes our ability to delude ourselves that a new problem must be someone else's code.
I don't want to police my coworkers. I just want to know that the infrastructure will hold my proverbial weight when I step on it, so I don't have to be afraid of side effects all day.
I want to know what I can trust them to do, and work to expand that envelope. If people break stuff, I eventually want them to be able to figure it out, reproduce it, and fix it all under their own steam, and [believe] that it's the right thing to do.
My automation strategy is a superset of a list I'm sure you've all heard already:
Make it documented
Make it work (automated)
Make it right (trustworthy)
Make it recommended
Make it easy (may include fast)
Make it mandatory
Make it an HR problem
About the time the tool is starting to get easy, you can start teasing people for not using it, but at some point it's expected behavior. More peer pressure from more sources. If they still like to cowboy, that speaks to trust, and they start getting delisted from new initiatives. If that still doesn't work (which, sadly, occasionally is the case), they are compartmentalized and shortlisted for the next reorg or layoff.
Reproducible builds make diagnosing performance regressions less horrible, because they are less likely to have been caused by random shuffling of code, and any random shuffling is reproducible when trying to bisect.
In critical infrastructure, it's often essential to provide customers with a single fix. If you can't reproduce the build that they have, then you can't do that.
We have an internal tool that grovels through dependencies looking for first party code, and then pulls that thread all the way to hyperlinks to the ticket numbers and the commit diffs.
So not only can we build another copy with the same third party code, when we do a bug fix you can validate that you in fact only got that one change into the bugfix.
Even with infrastructure to do surgery on dependencies, it's quite possible to fat finger something and get too much or not enough. So we built a sanity tool that gives the engineering sign-off prior to validation a bit of gravitas.
> - If a particular version of the code is tested, and the binary is not a pure function of the code, then you haven't really tested the binary. Bugs could still be introduced that were not caught during testing because your build is non-deterministic.
This is a weaker argument IMO because when building for test, generally, all optimizations are disabled, debug info is emitted, symbols are un-stripped, and so on. The unit under test is usually very different from the shipped artifact even at the module level. Not least because the test functions are compiled in.
You do see this in embedded for several reasons, including:
- For high-volume production, reducing ROM size can have a big impact on profitability (this is less true than it was 20 years ago, but still true), so your dev boards will have large EPROMS and your production boards will have small ROMs
- Debugging tools present may allow for easier reverse-engineering of your devices
Obviously the devices go through a lot of testing in the production environment, but things like error-injection just may not exist at all, which limits how much you can test.
Compared to basically every other part of release qualification (manual QA, canarying, etc.) re-testing on the prod build is so unbelievably cheap there's no reason to not.
I suppose we're referring to different kinds of testing. Manual QA, etc, on prod sure.
But if you're building client software artifacts, to unit test or integration test involves building different software in a different configuration, with different software and running it in a test harness. To facilitate unit testing or integration testing client software you:
- Build with a lower optimization level (-O0 usually) so that the generated code bares even a passing resemblance to what you actually wrote and your debugger can follow along.
- Generate debug info.
- Avoid stripping symbols.
- Enable logging.
- Build and link your tests code into a library artifact.
- Run it in a test harness.
That's not testing what you ship. It's testing something pretty close, obviously, but does not bear any semblance to a deterministic build.
On the contrary; it's quite possible to design automated tests that operate on release artifacts. This is true not only at the integration level (testing the external interfaces of the artifact in a black-box manner), but also at a more granular level; e.g., running lower-level unit tests in your code's dependency structure.
It's true that not all tests which are possible to run in debug configuration can also be run on a release artifact; e.g. if there are test-only interfaces that are compiled out in the release configuration.
I think maybe the source of the confusion in this conversation is perhaps the kind of artifact being tested? For example, if I were developing ffmpeg, to choose an arbitrary example, I would absolutely have tests which operate on the production artifact -- the binary compiled in release mode -- which only exercise public interfaces of the tool; e.g. a test which transcodes file A to file B and asserts correctness in some way. This kind of test should be absolutely achievable both in dev builds as well as when testing the deliverable artifact.
> I, uhh, usually do this in my released software too.
Do you have any idea how annoying it is to get logged garbage when starting something on the command line (looking at you IntelliJ)?
I once spent several weeks hunting through Hadoop stack traces for a null pointer exception that was being thrown in a log function. If the logging wasn’t being done in production, I wouldn’t have wasted my life and could have been doing useful things. Sadly, shutting down the cluster to patch it wasn’t an option, so I had to work around it by calling something unrelated to ensure the variable wasn’t null when it did log.
Yes, which is why I regularly (think quarterly or annually) check to make sure we have good log hygiene, and are logging at appropriate log levels and not logging useless information.
I have alerting set up to page me if the things I care about start logging more than the occasional item at ERROR, so I have to pay some attention or I get pestered.
Hrmm. Surely the vast majority of testing happens on non-release builds, despite the fact that release builds may also be tested. Unit tests are generally fastbuild artifacts that are linked with many objects that are not in the release, including the test's main function and the test cases themselves. Integration tests and end-to-end tests often run with NDEBUG undefined and with things like sanitizers and checked allocators. I would say that hardly anyone runs unit tests on release build artifacts just because it takes forever to produce them.
When I was at Google we ran most tests both with production optimizations and without. There is no reason not to do it since the cost of debugging those problems is huge.
> Surely the vast majority of testing happens on non-release builds, despite the fact that release builds may also be tested.
Of course.
> I would say that hardly anyone runs unit tests on release build artifacts just because it takes forever to produce them.
I don't know that this follows: just because 99% of the invocations of your unit test are in fastbuild doesn't mean that you don't also test everything in opt at least once.
I can't remember seeing any cc_test target at Google that ran with realistic release optimizations (AutoFDO/SamplePGO+LTO) and even if they did it's still not the release binary because it links in the test case and the test main function.
Did you look in the CI system for configurations there? I see FDO enabled in those tests. (Speaking at a high level, configurations can be modified in bazelrc and with flags without being explicitly listed in the cc_test rule itself)
> release binary because it links in the test case and the test main function.
Sure, but it's verifiably the same object files as get put into the release artifact.
Note: This article is not talking about deterministic builds (which are a prerequisite for reproducible builds), but specifically reproducible builds.
Reproducible builds are generally speaking interesting only from a security perspective, while deterministic builds have all sorts of useful infrastructural features which the author agrees are useful.
And if you think I'm being pedantic, I'm using the official terminology from the reproducible builds site[0].
You're semantically right, but also missing the point. The expense of getting to deterministic builds is large - You have to take great care in your build infrastructure and scripts. The benefits are also large, and worth it.
Once you've gotten to deterministic builds, the expense of getting to reproducible builds is small; Typically days worth of work as opposed to months. The benefits are very different, but far from insignificant, and almost invaluable from a security perspective.
If you're going to do deterministic builds, go for broke - Do reproducible builds.
It really depends. If I’m building a Java project, I’m pretty sure I’ve got a deterministic build just by running javac pointed at a source directory. If I want a reproducible build, I probably need to do a lot more:
- ensure timestamps of all files embedded in jar files is consistent
- ensure there is no BuildTime/BuildHost/BuildNumber variable of any kind being captured
- ensure the exact version of compiler is documented
- ensure exact versions of all dependencies in classpath is captured
There's an interesting pattern I've found in this: Don't ensure that there's no BuildTime/BuildHost/BuildNumber embedded. Ensure that all variables that are part of the build are captured and embedded. That is - It's okay for your build to include the Build Time, but that's an assertion at build time. Include it as a build output. Binaries should include all of the mutable environment used to build them as an embed. As in, their --version output should include them.
# bazel version
Build label: 2.0.0
Build target: bazel-out/darwin- opt/bin/src/main/java/com/google/devtools/build/lib/bazel/BazelServer_deploy.jar
Build time: Thu Dec 19 12:33:30 2019 (1576758810)
Build timestamp: 1576758810
Build timestamp as int: 1576758810
Deterministic build means every time you compile the same source you get the same executable. Reproducible build means that you specify and relay enough information to allow everyone else to reproduce your results for themselves in their own environment.
Completely agree! The article is blinkered to the security aspect. But imagine if compilers didn't create reproducible builds. Debugging would be a nightmare!
Uh? Is that sarcasm?
Compilers don't produce reproducible builds.
If you try to investigate a core dump using a binary recompiled from sources instead of the original binary, it's very likely you won't be able to analyze the core..
By default you're not guaranteed the exact same output in two compiled binaries. There's a lot of variable bits[1] that make into binaries from C and C++. Different languages/compilers have different levels of variable bits.
Yes that would be reproducibility iff the environment is identical. However "identical environment" is a complicated issue.
Differing file paths, timestamps, and host date/time can all easily make their way into a binary through macros in several languages without explicit compiler/linker flags. If compiled artifacts are bundled into a container (like a jar file) their metadata need to deterministically set or else the container as an artifact won't be deterministic.
So yes doing all the work to make build deterministic enables reproducibility but it's not free or automatic. Then doing the work to ensure the build environment is deterministic is an additional task that's not free or automatic.
Since when don't compilers produce reproducible builds? We did that at my last workplace with appropriate MSVC compiler options.
In any case, maybe parent is referring to using centralized debug symbols which can work for anybody in the org because their compilers all generate the same output.
Having stable BuildIDs has been important for me in being able to sanely manage a debug symbol archive where some binaries are periodically rebuilt on different CI worker nodes.
This is exactly right. Having the build process be 'reproducible' i.e. deterministic means it is insulated from mysterious action-at-a-distance mechanisms that break things I want or need to be invariant within my CI/CD process.
>If a binary is lost, it can be rebuilt exactly as it was. You only need to ensure the source is preserved.
Wouldn't this depend upon the environment as well? Unless the build starts off by creating a build environment, but then we are half way to "To first make bread from scratch, create a universe...".
If you run your builds in Docker, this is usually taken care of.
If not, it seems like it'd be good practice to document exactly what build dependencies are used; I've had to track down a regression introduced by inlined code from a dependency when we weren't tracking build dependency versions, and the fact that a bug was introduced with no changes in the relevant section of the code was troublesome to say the least.
Tangential, but I actually switched back to Maven after using Gradle on a fairly large project for about a year. The incremental/cached builds of Gradle were awesome, but I found writing build.gradle files to be a bit too hacky. I can achieve everything I need to with some simple configuration of an existing Maven plugin, whereas with Gradle, it always felt like I was doing something super custom and fragile.
I'd still love to set up a Gradle cache server, sounds so fancy!
Reproducible builds also tend to create isolation. I.e. if I compile one C file, change another, the compilation of the second one usually doesnt affect the first. This is useful when trying to debug code. E.g. I can add print statements to callers to get more info surrounding a bug when in a non-reproducible state, the recompilation itself may have eliminated the bug.
> Q. If a user has chosen to trust a platform where all binaries must be codesigned by the vendor, but doesn’t trust the vendor, then reproducible builds allow them to verify the vendor isn’t malicious.
> I think this is a fantasy threat model. If the user does discover the vendor was malicious, what are they supposed to do?
> The malicious vendor can simply refuse to provide them with signed security updates instead, so this threat model doesn’t work.
For me, this is one of the primary benefits of reproducible builds and the author's dismissal of it as fantasy is unconvincing.
Vendors trade on trust. If a reputable application uses reproducible builds and discovers that a platform vendor is modifying their application on their platform, that information is damaging to that vendor's reputation. That's extremely useful leverage against potentially malicious vendors.
And vendor malice isn't the only reason why the software might be harmful. What if the vendor's toolchain was compromised without their knowledge? Reproducible builds provide a means for third parties to verify that it wasn't.
The holy grail of reproducible builds is achieving the same binary via different compilers. This was, at least when I started looking into reproducible builds, why I wanted to do it and why others wanted to do it. The other benefits are kind of side benefits.
Are you sure you aren't thinking of diverse double-compiling[0]? Which specifically does not require different compilers to produce the same output[1], just different runs of different binaries of the same compiler.
This can't be correct. What would be the point of different compilers be then? There's no way that every compiler would produce the same exact insrtuctions for each respected input. There would be no point to using an optimizing compiler or one with better intrinsic support.
Perhaps you're comparing a multithreaded version with a single threaded version, or a single-host build vs distributed build.
When adding new features to a compiler, you might want to verify that the old and new versions have the same output given the same input. If your compilers produced non-deterministic output, this exercise would not be possible.
I took the statement "different compilers" to be completely different projects and codebases. Of course reproducable builds make sense in that situation. But why in the world would you ever expect gcc to 100% always match the output of llvm? That doesn't make sense.
Yes, the argument in the article seems to revolve around it is harder user to verify a build that it is to simply build it themselves. If they build it themselves there is no need to verify it of course - so what's the point of verification?
You could apply that same argument to lots of things. It's hard to verify things adhere to standards in general - whether it be flammability of children's clothes, or additives in food. Bu that logic I should be making my own clothes and meals rather than relying on the manufacturers claims.
Obviously, no one does. The point is someone has made a verifiable statement, in public. There is no need to go to the effort of testig that statement a zillion times. In fact needn't be done at all in many cases - just the treat of doing the test with something goes wrong suffices. In the case of food where I live, a recent recall of strawberry's devastated the industry. No one was testing if strawberry's contained needles until someone eat one, no one got seriously hurt when they did. Well no body except the strawberry industry which was on its knees for months until once again people started to take them at their word their product was safe to eat.
Similarly if you care about security (and I do), if there was a choice between several several Signal like programs that all claimed verifiable builds, and then it was shown one was in fact lying about verifiability I'd drop them like a hot potato. And I wouldn't be dropping them for just months like the strawberry's, it would likely be forever.
In one line: the idea that every user has to test a verifiable build does in fact verify in order for it to be useful, is seriously flawed.
We can audit that source code, to ensure no key-leakage occurs.
Next, I install signal on my phone via the app store. How do I know the app I installed matches the source code that was audited? After all, google / Apple could decide / be forced to provide a modified binary.
Reproducible builds work for that.
Alternatively, consider debian. I ain't got time to compile every package I want from source (otherwise I'd run gentoo). Requires so much more dependency hell I'd rather avoid.
With reproducible builds, I can check the source for any package, and so can every other curious person. This way, instead of everyone needing to set up their own build environment for every package, we can all depend that once in a while, someone checks whether a build actually comes from trusted source code.
Is it less secure than building myself? YES. Does it make deploying compromised code more likely to be detected than without reproducible builds? Also yes!
Heck, if we change from "depend that once in a while, someone checks whether a build actually comes from trusted source code." To a federated system of trusted checkers. It gives pretty nice guarantees, and makes depolying compromised code pretty damn scary.
Essentially. Whenever the person compiling / signing the binary is not the person writing the source code, reproducible builds are pretty dang nice.
> Next, I install signal on my phone via the app store. How do I know the app I installed matches the source code that was audited? After all, google / Apple could decide / be forced to provide a modified binary.
Reproducible builds work for that.
At least on iOS it would be very hard to get a hash of the binary running on your phone, not possible without jailbreaking AFAIK? You could check it on your computer of course, but what's stopping $vendor from having an device.id == rocqua's phone check when they deliver the binary to your phone? Further, doesn't (at least) Apple do something with the signatures causing the binary downloaded to actually be slightly different and missmatch the one uploaded by the developer?
If you don't trust your OS and in extension (at least on very locked down platforms as iOS) your OS vendor, it's basically game over. I mean, they have root on your connected device and could easily change what binary is started, even if you could verify the copy you downloaded onto disk.
App Store-distributed ipas can’t be decrypted without keys burned into iOS hardware, so you can’t decrypt them on a Mac without a jailbroken iOS device. (You can hash the encrypted binary but of course that’s pretty useless for reproducibility.)
This might be about to change since Apple has announced support for running iOS apps directly on Apple Silicon Macs?
> Next, I install signal on my phone via the app store. How do I know the app I installed matches the source code that was audited? After all, google / Apple could decide / be forced to provide a modified binary.
> Reproducible builds work for that.
You read the original post, right? He discusses this at length. Actually, right in the beginning.
In short, if you go through the dance of building the binary yourself to compare it to what's installed, then you could just use the binary that you just built, without ever looking at what the app store has.
If the app store distributes binaries to thousands of people, and only one of them rebuilds from source to check, those thousands of people gain a substantial (but not perfect) level of protection.
Someone, we'll call them Bob, goes to all the trouble to build the binary, finds out that it doesn't match with what the app store is distributing then just shrugs their shoulders and uses it to connect with other users in a secure messaging network because their binary is correct?
> Next, I install signal on my phone via the app store. How do I know the app I installed matches the source code that was audited? After all, google / Apple could decide / be forced to provide a modified binary.
Isn't this already happening on both play store and app store? The actual package delivered to end users is actually generated and signed by google/apple, and they can modify the binary inside the package as they see fit (e.g. to optimize for each user's device arch, etc).
A number of large companies are quietly moving towards reproducible build. Sorry if I cannot name the names.
As a side note, reproducible builds implemented in Debian were also useful to spot various other problem: small differences in build environment that would make debugging more difficult.
Sometimes the same application will have different performance depending on the build due to memory alignment, data ordering, cache friendliness.
Finally, the article is making some claims that are, frankly, incorrect:
> Q. It’s easier to audit source code than binaries, and this will make it harder for vendors to hide malicious code.
> I don’t think this is true, because of “bugdoors”. A bugdoor is simply an intentional security vulnerability that the vendor can "exploit" when they want backdoor access.
Adding a backdoor and compiling a new "custom" binary might take 10 minutes and a lot of people in a company could do it and leave no traces.
Writing a "bugdoor", committing it and passing code reviews is very different. You might have to justify why you are touching a product / component / library that might be completely unrelated to your usual work.
Plus, you leave a very clear record of your action, giving up a lot of deniability.
> Q. It’s easier to tamper with binaries than to write a bugdoor, so reproducible builds do improve security.
> I absolutely disagree, every programmer knows how to write a bug or short circuit some logic. Hiding malicious activity in a binary, with a multi billion dollar malware industry determined to find it is more difficult.
This implies that the malware industry is somehow unable to detect a "bugdoor" or unexpected behaviors a runtime but able to detect a change to the binary...
> In addition, once you’ve produced and signed the malicious backdoor, it is not repudiable - you can’t deny you wrote and provided it.
Most organization track source code changes in a VCS but don't require employees to sign binaries with keys bound to each individual. If anything, this makes a point in favor of repro builds.
There’s another reason I think reproducible builds could all a lot of value: app stores. Right now, if I install from a normal app store (Apple, Google, Microsoft), there’s no real benefit to using open source apps. Even if I trust the app store, I have no way to confirm that the app binary matches the purported source.
App stores could improve the situation by building apps themselves, but I think that would put them in an position they don’t like. App stores don’t currently build their apps.
With reproducible builds, app stores could do better. An app store could list the hash of the build artifacts along with the purported name and developer, allowing various degrees of assurance that an app is actually a build of the source it supposedly comes from. Without reproducible builds, the app store would have to build the app itself and use its build instead in the submitted build, which seems undesirable.
Tavis’s argument about bugdoors still applies, but IMO it’s largely irrelevant to the major app threat model. Many useful apps don’t have input and output that is susceptible to corrupt data. A lot shouldn’t access the network at all. The common threat is that they include fifteen tracker SDKs, all of which are malicious by design. Including the entire Facebook SDK is going to be tricky as a bugdoor.
On iOS (not sure about macOS apps), when you submit an app, it is submitted as "bitcode" (note, this is not "bytecode" - it's bitcode, not a typo). This allows Apple to build your app as needed from your bitcode (think of the bitcode as LLVM IR).
This is done for a couple of reasons: (a) they can take advantage of improvements in their backend compiler (IR->executable) that happens after you submit your app, (b) they can build your app on platforms that didn't exist when you submitted your app
The net result is that the App Store plays an important part in the build process, which could legitimately generate different binaries (even for the same device, see (a) above) for the same app you submit.
They could also wrap your app in code that does things you didn't intend, like telemetry/metrics, interception of certain functions, etc. (Not that they couldn't do that anyway)
This is such an underrated improvement app stores could make, which would make a large impact for minimal effort. They already require signed binary submissions, just publish the hash so we can verify it!
Edit: it's a large impact for the tiny fraction of the population who's interested in verification, with no degradation of the experience for everyone else
It could be bigger than that. Imagine a little badge for open-source apps that don’t use any closed-source SDKs. These apps could be prioritized in search, and the users would benefit: less garbage in simple apps and higher battery life.
This won’t directly drive income for the app store in question, but it may drive perceived value of the platform as a whole.
I don't know about the 'formal' description of reproducible builds used in the article from a security perspective, but I do know that having the exact same output locally as is being built by a deployment process makes debugging a lot more straightforward because it removes the subtle and hard-to-discover problem of having slightly different library versions in your application making things perform differently.
Right, and I like the ability to re-build a release from 6 months ago and get something that reproduces the behaviour of the previous build - i'm not concerned if the binary is identical, more that it is functionally identical. A typical way of breaking this would be for build scripts to move forward but not be made backward compatible, or easy for the previous version to be used for an old build (for example, a newer compiler getting used rather than the one originally used).
I've worked on enough stuff where releases get held up for various reasons (and so production software can be 6 months behind head) and there's a desire for a fix to the released version and hence changes made to a branch as well as head to apply the fix. This sort of thing is made much easier if you don't have to spend time trying to work out how to get the branch to still build!
The arguments "don't follow." Reproducible builds were indeed more than once used to verify that the published binary does correspond to the published source.
Without having them, that kind of verification is much harder, depending on the build setup used, it can be even too hard to be achieved at all.
So we do have clear advantages of having the build infrastructure which results in reproducible builds, and I don't see anything that can substitute that.
The argument "if you build yourself from the sources, your build is then trusted" is not reflecting the reality. Most of the users are never going to build from the sources. Having reproducible builds, only a few people have to build from the sources to verify the binaries for the vast majority. Not to mention that without reproducibility, you can't even know if your own build environment is misconfigured.
Here's a hypothetical but a realistic scenario that not having reproducible builds is a major issue:
1. I create an app for a client.
2. Client installs the app on their 10,000 enterprise mobile devices and trains the users.
3. Month passes. I've changed everything in the app.
4. Major security or outage event happens. I need to change a line in the app and those 10,000 devices need to be updated ASAP.
5. If I can't checkout the old version, change the line and ship it without wondering "what else will change in this build" - then both me and my client are going to have a bad time.
It will take two weeks for a full qa/audit of the app and there's no time for that.
I've a feeling there's multiple definitions of build reproducibility going on here. I'm guessing you mean that it's not important to be have something byte for byte identical, but more to ensure that exactly the same build steps were run with the same source code?
For most of us, that's what build reproducibility means, but I guess for a subset of users it means producing an identical binary.
> but I guess for a subset of users it means producing an identical binary.
Whenever I hear people talk about the problems of creating reproducible builds, I often hear stuff about timestamps or other metadata inserted by the compiler that would "break" the reproducibility (under the stricter definition).
Having your own source code versioned and dependencies version-pinned (and pretty high confidence that the dependency package foobar-1.12 stays the same over time) seem just like old fashioned "good practice".
The looser definition would imply that all versioned software without external dependencies (or the source of the dependencies manually included in the repository) is reproducible?
Yeah, true. I was thinking of doing release builds in containers via the CI/CD pipeline, keeps the environment pretty static, but not completely static of course.
But further: All of these things would still not be enough for the strictest definition (exact same binary), at least with normal compiler defaults afaik?
Byte-for-byte identical builds is useful mainly because format-agnostic diffing tools are a lot easier to use. There are quite a lot of cases where something is not byte-for-byte identical merely because of a timestamp that affects nothing.
That said, I think it was (and perhaps still is, to many people) surprising just how many sources of irreproducibility exist. Timestamps and absolute build locations are obvious sources, and to some degree, generally don't have an effect. Iterating over file inode order (i.e., "for each file in directory {}") is usually innocuous, but it can cause link order issues and change ordering of static constructors--which can have drastic effects (both in terms of performance and actual functional changes) on the resulting binary.
But if your build process is going to unexpectedly change the encoding of text files [1], that's actually pretty terrifying. There are also cases where the compiler just seems to randomly choose how to optimize code [2]. Note that randomness isn't coming from an obvious "if rand() % 4" check here, but perhaps from something more subtle such as "we're iterating over a map whose keys are addresses of internal data structures, and we stop optimizing after hitting 1000 entries as the function is too big, and the addresses change because link order or ASLR."
You would still have to do it with reproducible builds. It's only reproducible if you have the same code. If you change something there is no reasons to get the same result, a bit like you would expect with a hash, it is deterministic, but that's all.
If you are reading this, I'm one of the members of the TUF [1] and in-toto [2] team, where we try to solve exactly this kind of problems. While I agree with you that reproducible builds sound a lot simpler than they actually are to achieve (and leaving aside all the practical complexities you mentioned in the blog post), I think they provide value for a certain use case seemingly not mentioned in the blog post.
It is the case where the vendor is a traditional Linux distro, and we have independent reproducible builders to ensure that a compromise of their CI/CD infrastructure is not enough to cause malware to be installed. It is true that the builders can still go off and reproducibly build malicious code, but this can be mitigated by requiring a high enough threshold of (presumably independent) developers to sign off the code. The problem of malicious source code is infeasible if not impossible to solve cryptographically, but we can make sure that CI/CD increasingly sitting on the cloud are not blindly trusted.
Could not post on your blog. Let me know what you think. Thanks!
The premise of the article is ignoring difference between "trust" and "trust but verify".
And then there's a QA which answers criticism with "I absolutely disagree", "I think this is true, but".
> We know that attackers really do want to compromise build infrastructure, but more often they want to steal proprietary source code, which must pass through build servers.
This has shifted the goalposts so much they're on a different field now.
Yes, exactly. And crucially you don’t need to verify yourself to get the benefits of verification. There is value in crowdsourced verification (i.e. I always have to choose a trusted vendor but it is safer when I know others might try building themselves and raise a fuss if there is a mismatch between their build and my vendor’s build).
I think OP is coming from a different perspective than I (corporate bespoke solution builder) do. When I say "reproducible build" I mean a build that is the same on any machine (i.e. no special magic necessary to build a "official" version of the code). Too often in corporate environments, getting a local build or setting up a new build pipeline involves arcane black magic and/or copy/pasting weird libraries that can't be pulled from any sort of "official" repository. curl/bash libraries that "automatically change versions" based on when upstream decides to change them can wreak havok when setting up a new build environment. My $0.02, it's not (in the corporate world) so much about validating binaries, but more about "how many steps beyond check out the code" exist and how easily can I validate my binary uses the same versions of libraries/dependencies as the one that a local developer tested?
The GNU project used reproducible builds in the early 1990s. Changelogs from 1992 indicate the ongoing effort. [4]
One of the older[5] projects to promote reproducible builds is the Bitcoin project with Gitian. Later, in 2013, the Tor (anonymity network) project started using Gitian for their reproducible builds.[6]
In July 2013 on the Debian project started implementing reproducible builds across its entire package archive.[7][8]
By July 2017 more than 90% of the packages in the repository have been proven to build reproducibly.[9]
In November 2018, the Reproducible Builds project joined the Software Freedom Conservancy.[10]
F-droid uses reproducible builds to provide a guarantee that the distributed APKs use the claimed free source code.[11
Well I think that's the crux of the issue, i've been using the term reproducible build to mean something different for the last 25 years. I'm after functional equivalence not binary equivalence. Now it might be impossible to guarantee one without the other, but that's all I care about.
For this reason I’ve been dockerizing my builds for almost five years. I was late to the Docker party, but when I saw the benefits it brings to build pipelines, I was sold.
It's true that a dockerized build isn’t any simpler than its non-dockerized ancestor, but at least there’s a Dockerfile that lays bare all the black magic and special sauce which goes into each build. And it can be version controlled to watch for drift over time.
This stuff is useful in a corporate setting, but the other fetishization of reproducible builds is just a distraction that can stay where it belongs: open source mailing lists.
But docker builds themselves are generally not reproducible, so I don't really see the gain?
Yes with some effort, they can be made reproducible but the vast majority of the dockerfiles that I've encountered do not pin the versions of every dependency.
Some might pin a few key dependencies but nearly all do an apt/rpm/whatever update at some point followed by a bunch of install commands which don't specify versions.
While your Dockerfile helps you know how a project was built at a specific point in time, it's not going to work forever. Even if the file doesn't change over time, the build it produces will. It's mainly because of installing packages using something like "apt-get install $package". It also can change if the files you're adding with ADD or COPY change.
You don’t have to download the internet upon each build.
First, in a corporate environment it’s common to run builds backed by artifact servers that’ll cache just about anything.
Second, it’s easy to place files in a Docker build context (that’s just a $25 dollar way of saying “next to the Dockerfile”) that would have been downloaded from the internet, but are stored locally instead. This is easier said than done for some formats. Source tarballs? Easy. Anything Java or Debian that requires a pesky server which works a certain way? You’re going to have to use a caching artifact server.
While Docker can be very useful against attempts to "download the internet" (possibly simulating multiple remote servers on a fake network) and aganst accidental changes to source files, configurations and tools, there are sources of intentional nonreproducibility (e.g. embedded timestamps, common in Windows executables) that need to be addressed more directly.
These arguments against reproducible builds are very provincial, temporally.
Which is to say, they assume a user now downloading source now to compile and use now.
One of the things I like about reproducibility is that someday, in the future, when the project is closed and the website is down and the author is dead ... we can take the source code and compile it and compare it to our own notes (or to the wayback machine, or whatever) of the checksums, etc., of the binaries and gain some confidence that we have what we think we have.
I'm missing some context. I'm assuming the author is referring to binary distributions of open-source software? Otherwise, "reproducible builds" means something slightly different in a commercial software development environment. (Think of someone building an e-commerce site where the software is proprietary.)
Thus:
> You don’t need reproducible builds
No, not for open source. If it's open source, it's only valuable and useful to have the source code if I can build it. Otherwise, if I can't build it, to me it's no different than closed source.
> You don’t need reproducible builds
In a commercial development environment I need to build the code I work on. What about when a bug exists due to problems in the build environment? If I can't reproduce the build environment, I can't fix the bug.
I can think of one argument against reproducible builds that really make sense:
You have some legacy system incapable of reproducible builds, and the social effort to convince people not to care is less than the technical effort to fix it.
I think the author hasn’t accurately described the workflow where reproducible builds are used.
Here’s my attempt.
There are three entities:
1. The vendor, who creates and _distributes_ a product, which may include software, to the end user
2. The end user, who receives the product from the vendor, operates the product, and trusts one or more verifiers to correctly certify the product according to some standard that the user desires
3. Verifiers, who receive proprietary access to the vendor’s product (e.g. source code) and check that the product meets a set of standards and assert that fact to the users
Note that verifiers are NOT in the job of distributing products to users. That’s a hard job and it seems understandable to me if verifiers don’t want that job.
Reproducible builds help link a specific version of source code that a verifier certified to a specific product being operated by a user. Without it, the user trusts the vendor much more than with it.
Yes it adds brittleness and delay (see FIPS 140 certification). These are use case specific trade offs that only users can judge.
In the case where some facts about the source code can be formally verified, reproducible builds support that trust relationship much better. I might go so far as to say that reproducible builds are essential for trusting formal verification.
You can also imagine a software supply chain that includes more steps than a simple vendor -> user relationship. Much of the proprietary software used today include libraries from third party vendors. There are integrators that add their own special sauce. The supply chain looks more like: vendor -> vendor -> ... -> vendor -> end user.
Imagine each vendor has their own set of verifiers responsible for certifying that vendor’s output.
It shifts and distributes the burden of trust. There are mountains of scenarios where we shift from "we rely on trusting this one person / entity" to "we rely on trusting at least one of these N people / entities", and that is a huge win every time.
> this just shifts the burden of who to trust to the verifiers
Agreed. A colleague once said that trust is like a balloon; if you reduce trust in one area, you tend to expand trust in another area.
I think the typical response to your statement from a user is that it’s easier to trust a set of verifiers than it is to trust a vendor. The act of a verifier blessing an artifact makes that artifact more trustworthy.
Personally I’m skeptical of these claims, but that’s the underlying assumption of the certification process
Presumably there is benefit in reducing trust in a party with mixed or unknown incentives to increase trust in a party whose incentives align with your own. For a critical piece of software I could imagine a schema where a user pays one or more independent verifiers to validate the software. That would allow the user to control the incentives of the entities telling them the software is safe.
> Q. It’s easier to tamper with binaries than to write a bugdoor, so reproducible builds do improve security.
>
> I absolutely disagree, every programmer knows how to write a bug or short circuit some logic. Hiding malicious activity in a binary, with a multi billion dollar malware industry determined to find it is more difficult. In addition, once you’ve produced and signed the malicious backdoor, it is not repudiable - you can’t deny you wrote and provided it.
>
> With bugdoors, you don’t need to deny it - you just claim it was an error, and you’re automatically forgiven.
If the binary vendor has the source code, every source exploit is also a binary exploit. If you can make a semi-"bugdoor" that's plausibly deniable, and then augment it with a binary tamper that's deniable as an artifact of non-deterministic compilation, is that not the ultimate coup de grâce?
For me security is not the only thing that make reproducible builds interesting. I'm often interested to link a binary to its source in retrospect. This is at a time when the vendor might long be gone. It helps a lot to be reasonably sure that the source you analyze is the one that was used to build the binary and you don't waste time studying the wrong source.
The counterargument is, of course, that if you are serious you should look at the binary anyway and never trust the source, and I can't argue with that, except - for better or worse - that this is not the world I live in. For me an as short as possible time to answer a question about a software with as much confidence as possible is key. I'm alway grateful when I can reproduce the build with reasonable effort and then look at the source instead of the binary. Regardless how much I love to tinker with binaries, looking at the source is usually just faster.
It looks like almost everyone here thinks this is wrong and poorly argued (as do I). A meta question to ponder: why did this get upvoted to the front page? Is it that upvoters tend to not read the articles whereas commenters do? Do people upvote stuff they think is wrong for the purpose of discussion?
As much as we like to pretend to only look at the arguments itself, I think part of the reason is because Travis is a pretty well known hacker. If I or you had published it, it probably would not have made it to the front page.
Even if I disagree with his overall opinion on reproducible builds, I think he does bring forward valid arguments and I personally took something away from reading his blog post (e.g. the word bugdoor).
But to your point, I think up-voting is a sign of approval, while commenting usually is used to criticize. Why would anyone comment on something they 100% agree with?
Because there are good points. Reproducible builds are not everything some people think, but they are still useful. Knowing what is wrong with them is important.
If building the same source produces different binaries, then I'd like to know (a) what is causing the difference and (b) what other differences does it cause?
Being able to produce a consistent result is simply a sign of professionalism.
How about using a different toolchain? (eg: gcc vs clang). Or even different versions of a toolchain? Or a dependency that has to be downloaded? Being able to build consistently requires way more effort than just following professional practices. One method I know is to pin everything that goes into a build - source, dependencies, toolchains, configurations and environment.
Results of non-consistent build can be as simple as a difference in performance. But it could also be a malware injected through the compiler.
Yes, yes and yes.
The toolchain and the dependencies should be pinned.
Otherwise sooner or later you'll hit the customer's issue that you won't be able to reproduce - until you realize it's some subtle bug in the specific version of your compiler.
I dont think that reproducible builds are perfect or solve every problem. I still think they are worthwhile though because they allow for a better audit trail, and every step that is auditable makes it easier to pinpoint at what step something bad happened.
Sure users arent likely to verify them. I still think there is value in after the fact being able to determine - clearly this binary came from this source code artifact vs clearly this binary was substituted as it doesnt match the source code. Its not perfect but it helps reduce the places where an attack can take place with nobody noticing.
You know that principle that mathematicians invoke when someone "proves" P=NP? i.e. do they mention any of the classic problems in the space? Well, if you can't find the word "Debian" in an article on reproducible builds you can be reasonably guaranteed it's missed something.
"What isn’t clear is what benefit the reproducibility provides. The only way to verify that the untrusted binary is bit-for-bit identical to the binary that would be produced by building the source code, is to produce your own trusted binary first and then compare it. At that point you already have a trusted binary you can use, so what value did reproducible builds provide?"
Being able to reproduce a build not only validates the source but the entire process of creating the executed artifact. To properly capture this ability and value the CI/CD pipeline should also be reproducible in an infrastructure-as-code manner. Together with the programs' source and short documentation (as it should be coded) on how to put these parts together it means the operation of the company can be recreated in another geography/datacenter and continue to work while team members rotate out/in.
Contrast the situation with inheriting a codebase that only has possibly matching source code and a production environment that everyone is afraid to touch at the other extreme.
Extreme build reproducibility requirement: had a customer in telecom. Their customers (carriers) required downtime no more than 5 (or 10?) minutes per decade — 7 nines I believe.
When they reported a bug they wanted the fix and then they differed the binary to be sure that every code change was related to the patch itself and nothing else. No additional fixes or upgrades. They were willing to pay through the nose to stay on a very old version of the code.
The code was in C (this was mid 90s) — can’t imagine how much harder it might have been with extensive templating and the like.
I think he's right about all the security arguments. It is definitely nice to have properly reproducible builds from a build system point of view though. Things like ccache would work a lot more reliably.
"More often, attackers want signing keys so they can sign their own binaries"
Isn't the idea that you don't have trust signing keys anymore because you rely on consensus? An attacker would probably have a much harder time compromising 10 vendors instead of one.
You are right, about complexity and issues, and you certainly don't "need" reproducible builds, depending on what you are after but it can be beneficial.
Also, you're arguments often seem to come out of thin air:
"reproducible builds are not for users" Why? Maybe not for the average user (yet).
My understanding of reproducible builds seems to be different. I always thought of it as a way of ensuring people couldn't deploy edited code from their workstation.
You would ask trusted system A build the code at a particular code-reviewed snapshot. It fetches the code itself, builds, and codesigns. Then you hit deploy on system B, which is the only system with credentials to modify prod. That system looks for A's signature. There's no vendor involved, so maybe I'm missing a better understood picture of what a reproducible build means.
Build reproducibility doesn't solve the problem of "how do I know the computer I don't control is actually running the program I think it is?"
The only solutions are related to fully homomorphic encryption, which is a way of running a program on a computer you don't control where the operator of that computer cannot learn information about the work it is performing. Unfortunately, pure software approaches to FHE haven't been proven out.
It is possible to get the benefits of FHE by utilizing a software enclave, but then you have a different trust narrative that involves hardware (and attestation services). The point of remote attestation is that you can verify that a computer you don't control is running a specific binary (verified by the hash and a signing key) at a specific moment in time. The problem is that the chain of trust for the signing key traces back to the manufacturer of the enclave.
The holy grail would be some way of decentralizing the trust of the enclave OR finding a way to do purely software based FHE that was fast. I'm not holding my breath for the latter, but the former might be possible in the next decade.
In short, if you have a reproducible build but you don't have a way of verifying that binary is actually running on a remote server you don't control, the reproducibility of the build is a moot point. I sorta think the only systems where build reproducibility matters today are ones that use enclaves (but I'm willing to be disabused of this notion if people feel otherwise).
For us, the biggest benefit of reproducible builds is debuggability. One year later, if we get a bug report from a customer, we can fully recreate the issue in house, patch it, and provide a new build with confidence that we fixed the problem.
Without this, in a huge interconnected system you go insane trying to control all the variables needed to figure out what exactly went wrong.
> With bugdoors, you don’t need to deny it - you just claim it was an error, and you’re automatically forgiven.
Interesting off topic point about the lack of accountability in the profession. Even as a sysadmin, one accidental iptables flag might let an intruder gain access. What will happen to me? Angry boss maybe.
Interested in reproducibility for caching/performance/debug/rollback reasons, not security. Thanks, Tavis, for clarifying that security gains are not useful. I don't usually consider them but when I have to, I'll look up this article.
If you use a non-mainstream compiler it is a horrible mess. Almost every other time I build my program, the compiler crashes and I have to make a clean build.
Recently they have fixed the issue, I updated the compiler, and now it still does not compile. Random type checking error. After a clean build, the type is accepted and it compiles. I tried to make a mwe, and instead of a type checking error, the the compiler crashes at that line.
At least it always compiles with a clean build. But the resulting program does not start (not accepted load segments). I have not figured that one out, so I uninstalled the update and keep using the crashing compiler version.
Reproducable builds keep honest vendors honest. If I introduce reproducable builds, I make it likelier to get caught if I tamper with my own binaries. Likewise if I provide checksums for my downloads.
There are a lot of popular freeware and open source projects from small teams or individuals. You can't tell me that no one is at least a little bit tempted to slip in a trojan occasionally.
Forgive me if someone has already mentioned this, but if I have multiple vendors, why can't I compare the hashes of all of them before I ask for the binary? Then any one vendor is being held to account by the others before I install. No source is required, just the hashes of the source and the hashes of the resultant binary.
The whole point of the article is that in order to check if the play store version is the same, you have to download the github version and build it from source. At that point, you might as well throw away the play store binary and use the one you just built.
Yes, but it takes just one person to download github source and build it to verify that a binary available in the play store is safe so that the other 1,999,999,999 Android users don't have to.
If you're looking for an actual spicy take, you could say that reproducible builds are impossible once the durability of hardware/state of the universe is accounted for, but that the concept provides yet another black hole of cash and effort with which devsec can build a cottage industry around.
Define reproducible. I always archive dependencies so I can rebuild, even if that build has some new timestamps in it. Having a rando Github repo go down when you rebuild is not a failure mode I want.
Well, after spending so much time watching kilometers of rolling text on my screen while `emerge -uavDN @world` compiles everything, I agree I don’t need reproducible builds
Reproducible builds allow people to independently verify that a build came from the source code that the binary vendor claims it did, rather than some compromised source code.
i think mostly we want to be as reproducible as possible is because a non reproducible build means there might be new bugs in it that werent there previously (when using the same source). for example a dependency might have been updated and might now be incompatible. it happens. its not malicious, but its annoying. this usually gives a lot of 'works on my machine' discussion, which is a rather frustrating and unhelpful situation.
tavis assumes there's an complete organization which is either good or bad. but what about people with in the organization with restricted or accountable access to sources or binaries?
i think reproducible builds are about malicious parties in an bigger organization.
Reproducible builds are primarily useful at organizations with strong auditing requirements. The build system is the root of trust and it's possible to achieve a high degree of trust inside an organization.
I think what the author misses is that having multiple vendors or distributions providing signed deterministic/reproducible builds reduces the total cost of maintaining individual trusted build environments.
For example, the author claims that the most straightforward way of producing a trusted build is to do the build from source oneself. That is true, but ignores the cost of millions of individuals spinning CPU cycles to build their own local packages, which has already been deemed too high by most users.
If N independent entities build and sign reproducible packages for a distribution then the probability of incorrect binaries being produced is P(individual_problem)^N for as many N as local package managers want to check, or trust an aggregator to fetch and compare signatures from all N producers. N can be far smaller than the number of individual uses of the packages while still being more trusted than a single vendor maintaining their own highly-trusted build system. If large organizations participate in this multi-entity process they can only increase their own certainty that they've produced accurate builds from source for packages built publicly at least.
Deterministic builds also solve the compiler back door problem (ala "Reflections on Trusting Trust"). Compiling each repeatible/deterministic compiler (e.g. GCC, llvm, TCC, MSVC) with every other compiler and verifying that all deterministic builds are identical from any compilation path, e.g. that tcc(gcc(MSVC(llvm(tcc)))) produces identical output to MSVC(gcc(tcc(llvm(tcc)))). The process can be extended to verifying these paths under multiple OS's and hardware architectures. This establishes a practical root of trust; a well-known compiler binary trusted to translate source code into binaries without binary backdoors. https://news.ycombinator.com/item?id=10181339 for previous discussion.
Finally, deterministic builds allow verifiable signatures of the form "This container OS with signature A running a software package with signature B with input having signature C produces output with signature D" for arbitrary choices of deterministic source code.
This allows for verifiable computing in general; the ability to trust that the output of running compiled source on a particular input (including other source code) actually produces a particular output. This reduces the cost of establishing a trusted system from the high cost of building everything from source to the cost of building the root of trust from source and trusting the plurality of signatures establishing that building the rest of the system from this trusted root results in the same publicly available binaries. The process can be extended all the way to formal verification of the root of trust and any other desired components.
Has someone taken over Tavis's blog? Two posts in a row now that seem like he either just doesn't understand some concepts, or is intentionally disseminating nonsense that is pushing people in the wrong direction on things that really do bring about benefits related to security.
First, a non-flawed security argument: it only takes one non-malicious person to build a package from source and find that it doesn't match the distributed binary to spot a problem. Sure, if you don't compile the binaries yourself, you might not find out until later that a binary was compromised, but that's still better than never finding out. The reality is that most people don't want to spend time building all their packages from source...
More generally, reproducible builds make build artifacts a pure function of their inputs. There are countless reasons why this might be desirable.
- If a binary is lost, it can be rebuilt exactly as it was. You only need to ensure the source is preserved.
- If a particular version of the code is tested, and the binary is not a pure function of the code, then you haven't really tested the binary. Bugs could still be introduced that were not caught during testing because your build is non-deterministic.
- It provides a foundation for your entire OS image to be built deterministically.
- If you use a build cache, intermediate artifacts can be cached more easily, and use less space. For example, changing the code from A -> B -> A will result in two distinct artifacts instead of three.