I would really love to see the x86 multi-core architecture spec'd like this. To this day I don't really understand all the details behind x86 concurrency: memory, fences, atomics, concurrent operations ordering, mutex implementations. Such knowledge is crucial to writing fast lock-free structures, and to proving their properties either the whiteboard way or through model checking. Perhaps I should just get through one of these enormous Intel low-level manuals.
They have some interesting material there on building models of processors' memory-concurrency behavior. Unfortunately, reading the vendors' own manuals is not a good way to find out about that, even if you excuse the verbosity. Memory-consistency behavior is often very vaguely or indirectly specified in the architecture manuals, and worse, behavior of the actual parts does not always reliably agree with the manuals. A good paper: http://www.cl.cam.ac.uk/~pes20/weakmemory/cacm.pdf
Thanks for the lwn and cambridge links, they both look like a good read. Considering the big picture I feel like we need another Tony Hoare to show us good ways to reason about shared memory concurrent programs. CSP is cool, but at some point Erlang/Go message passing style is not enough, and one must dive deep into the crazy world of parallel processor architectures. It is my opinion that before somebody sorts this out well we should not expect processor manufacturers to play along - after all hardly anybody understands shared memory concurrency well.