I never get tired of re-reading this. It goes right to the heart of what is magical and wonderful and also scary about computer programming, the sheer power of abstraction available and the unintuitive consequences that are a result of those possibilities.
It also makes me wonder how much we can rely on things generally held to be secure. I certainly have never tried to guarantee that the compilers I get from the distros I use are fully honest, much less disassemble my BIOS, and of course the tools I would try to use even if I did achieve that level of paranoia might not be reliable and I would need other tools to test the tools, and the next thing you know I'm out in the woods sawing at trees to get lumber to carve because only a hand-built analytical engine is a reliable and trustworthy platform in any provable way.
The world needs an opt-in trusted execution environment. If it is mandated from above, then people will rebel against it. At the same time, it must not put compliant programs at a disadvantage to noncompliant malware, as most sandboxing schemes do so today.
There's a practical (for some value of practical) defense against this attack: "Countering Trusting Trust through Diverse Double Compiling." It's fairly clever. http://arxiv.org/pdf/1004.5548
If I recall correctly, here it is in short:
1) Invent a new virtual machine ISA.
2) Write a trusted compiler that targets that ISA. It is totally OK if this compiler is crap as long as it is correct.
3) Build the original compiler with the new compiler. The result is a cross compiler, running on your new VM-ISA and targeting your host architecture (e.g., x86).
4) Build the original compiler again with the output of step 3. The result is an x86-hosted compiler targeting x86.
The abstract seems to imply the last step:
5) Diff the output of step 4 with the untrusted compiler.
What about trusted hardware? Without that, your somewhat trusted compiler cannot be trusted, even if it's something like tcc whose code you might realistically be able to formally verify by yourself.
There are at least two approaches, but neither are really satisfactory... first, using an electron microscope and validating the finished design gate by gate. That won't work if the software and hardware toolchain used to generate the gate design is compromised. So you'd have to validate the gates without the help of the software toolchain used for design and layout.
I suppose you could also attempt to validate silicon (sans microcode) by attaching a clever external apparatus and external clock source and current meter (which you trust?), and timing every single instruction that hits the processor, validating the time it takes for the instruction to return results and the power consumed, and comparing that against the design specs. That suffers from the same general problem though: unless you're deriving timing and power consumption from basic principles, the tools used to generate expected timings and power consumption could have been compromised.
For processors with few gates, though, it could work.
The worst situations are where the malware is so subtle -- one changed gate or instruction for instance, with a specific application in mind that can be subverted through that change -- or where from an external POV the malware is non-deterministic, for instance if a processor randomly and rarely injects malware into a running system, using a hardware rng for randomness, on average once every million years of cpu time. Would chip makers be able to detect those sorts of attacks if they tried?
> This way you should be safe, unless whole world conspires against you.
That may be an egocentric way to look at being a victim. Perhaps the target of the malicious hardware is the whole world, and you were just caught in the net.
I had the pleasure of reading this paper during a computer security course. To complement the paper, the prof gave us the task of writing a 2-level polyglot quine (a program in language X that outputs a program in language Y, which outputs the original program). It was my first exposure to quines, let alone polyglot quines, so you can imagine how much I struggled. This paper will always remind me of the frustrations I experienced, and the brilliance of the moment when the "trick" finally clicked for me.
Does that have more to do with technical issues like error correction and the fact that immune-system-like programs are hard to design, rather than trust? What amazes me more than modern computers running multitudes of things with incredibly deep layers of abstraction, all with the CPU doing billions of computations without error, and all of this working at all, is that biological systems work at all. Biological systems are messy, but they work. Mostly. I don't trust my body to not just give up one day and quit for no apparent previous reason. (Lots of interesting cancer cases like that, one I remember: wake up one morning, start showering, suddenly start shaking all over, go to the doctor, body is infested with malignant tumors spreading through the blood system...)
If the error is more on trust, are you worried about a computer program misinforming your uploaded mind? Ever watched national news? Are you worried about one copy of you being killed by a cracker? Do you trust [some government] to keep your water supply clean?
At least some types of safety critical software require binary analysis, which is expensive, but addresses both this, and the more banal compiler bugs.
Binary analysis only addresses this if you manage to analyze everything involved in the system: OS kernel, utilities, device drivers, bootloader, ROM BIOS, etc. And then you have to not let any untrusted tool touch any of those components.
After all, analyzing the binary file only protects you if you can guarantee that the same bits get loaded into memory and executed unmodified.
Thompson's technique could easily have been extended so that the code that loaded executable pages would look for a binary signature of the login code, and modified it at load time. If you do the same thing for the binary signature of the file loading code, you get the exact same thing Thompson described, but at the level of binary machine code going from disk to memory, instead of at the level of code going from source to executable.
However many levels you analyze, in theory the adversary could have gone one level further in their attack, unless you build everything yourself (including the CPU, presumably).
Unless, of course, you're hardware has been backdoor-ed when it was created (or do you trust all hardware manufacturers, including those in China under potential pressure from government...). See for example King et al. on "Designing and implementing malicious hardware", online version appears to be here: http://www.usenix.org/event/leet08/tech/full_papers/king/kin...
In real life, this is a non-issue. Even if one piece of software is compromised, you cannot compromise all of the parts of the system so they align perfectly.
For example, someone compromised your C compiler such that it adds a secret login when it compiles UNIX sources, and that it adds the same feature to the compiler code.
You can modify the compiler code to accept Modified C Language. You can then write a transforming program that converts C into MCL. Then you will compile your new compiler code with the compromised compiler. This will give you MCL compiler. It will still contain the backdoor logic. However, that logic will be deemed irrelevant. Any pattern-matching will be useless against your new language, since attackers cannot know in advance which transformations you applied to it.
After that you can get compiler source, apply MCL transformation, compile the result with your compiler, and you will get a "clean" version of C compiler.
"You can then write a transforming program that converts C into MCL. Then you will compile your new compiler code with the compromised compiler. This will give you MCL compiler. It will still contain the backdoor logic. However, that logic will be deemed irrelevant."
If your transformations preserved behavior wouldn't the backdoor still be there?
What does it matter which language the compiler compiles its code in to as long as the behavior remains the same?
The original scheme recognized a particular sequence of code; recognizing particular behavior no matter how it's implemented is in theory equivalent to the halting problem, I think.
I love this essay too. When I was looking at security issues that were going to arise by creating 'executable content' with Java it reminded me of the sheer magnitude of the problem.
In re-reading it I was struck by this comment:
The moral is obvious. You can't trust code that you did not totally create yourself. (Especially code from companies that employ people like me.)
Ken was one of the people Google hired who didn't have to get 'slotted' (for obvious reasons) and Bill Coughran held him up as the "the kind of engineer Google wants to hire." The juxtaposition is delicious.
Sorry, nominally when Google hires engineers they offer a job at 'x' level of engineer but you're not actually hired at that level. What they do instead is wait 6 months and then a committee evaluates your performance relative to other engineers in the company and then tells you what level you are at Google.
The point of the exercise is to avoid hiring people as 'Senior Staff Engineer' (even if they had that title at a previous company) and then finding they only put out 'Staff engineer' or less in the Google equivalent.
They don't change your salary if they effectively demote you down a slot or two but the pay ranges are different. If the committee decides you are more than a couple of levels below what you were hired at they fire you. If you do 'down slot' it means that if you do get a promotion later you probably won't get a pay raise.
Its supposed to keep things evened out. Generally it means they don't successfully hire senior people from outside the company. It was entirely unclear to me how they handled acquisitions of talent.
Someone correct me if I'm wrong, but isn't there a relatively-easy solution to this?
Store everything in a versioned, secure-hashed repository. Git is too complex, just checksum everything and store it along with the code (this repository is created by the rest of this paragraph). Hand-build a minimal, trivially-human-readable (thus trustable by those who are interested) compiler to compile the first commit. Compile SHA. Get a secure hash of the binary. Store it.
Read the next commit. Verify there is no "compile('bug 1')" in that step. Compile it with v1, and store the hash. Continue.
You now have a chain of trust established, starting with something you can read easily, and building minimally at each step (just read the diffs). And you've got the signature of the trusted compiler at any stage. Now check the signature of the downloaded compiler against your known-trusted signature. Compile, secure in the knowledge you have no infected compiler - an infected one wouldn't match the signature of the compiler you could create from your most-recent trusted result.
This is extendable with digital signatures, the algorithms of which can be understood with a little number theory, and the code shortly thereafter. Choose who you trust, and spread the work out, just like every other trust-based system. Or trust nobody, and do it yourself (edit: yes, this implies you must build your own computer from tools you develop independently).
We've now arrived at the problems around trusting other humans, which are unsolvable. The fact of which must be accepted to varying degrees, or rejected entirely, which requires you to live in isolation. Since you're reading this, you're not in isolation, so you accept some level of human trust, so the previously-established chain of trust would solve the issue for you if the ones you trusted had approved the chain you're using.
In this setup, all you'd need to trust is others who fill the gaps you don't directly approve yourself (note that this includes the security / validity of the hash algorithm). Which is the best you can do without omniscience, which would solve the trust issue on its own anyway.
You are wrong, because you make a huge leap of faith here:
Choose who you trust
No-one deliberately chooses to use software written by people who are known to be untrustworthy! And never has! Yet we find ourselves in a less than perfect world...
>The fact of which must be accepted to varying degrees, or rejected entirely, which requires you to live in isolation.
If you trust nobody in anything, then you cannot live with other humans. If you don't remove yourself, others will remove you, and that's pretty fundamental in any society. So some degree of trust can be assumed to exist, and where it is placed is entirely outside anyone else's control. If there is none in anyone else helping on the computational trust chain, then to have any trustable system you must do it yourself or go without. There is no middle road.
I should also clarify that I'm not claiming we have anything quite like this today. Merely that such a system is possible - the compiler chain is bootstrap-able with minimal effort and determining your measure of trust of any given build is straightforward: if there's a gap in the chain, the answer is no, until someone you trust fills it in. If not, then yes.
edit: in any case, trust, like the communication it is founded on, is an inherently un-decidable problem. At some point, to do anything, some measure of blind trust must exist - for example, my belief that this universe is rational in a way humans can comprehend. Thus, if anything has been done, some trust exists, and the system is bootstrapped.
One of the awesomest moments in science fiction is in Accelerando when one of the characters reveals that it's bootstrapped itself up from an alarm clock controller to escape this attack.
It also makes me wonder how much we can rely on things generally held to be secure. I certainly have never tried to guarantee that the compilers I get from the distros I use are fully honest, much less disassemble my BIOS, and of course the tools I would try to use even if I did achieve that level of paranoia might not be reliable and I would need other tools to test the tools, and the next thing you know I'm out in the woods sawing at trees to get lumber to carve because only a hand-built analytical engine is a reliable and trustworthy platform in any provable way.