They have some interesting material there on building models of processors' memory-concurrency behavior. Unfortunately, reading the vendors' own manuals is not a good way to find out about that, even if you excuse the verbosity. Memory-consistency behavior is often very vaguely or indirectly specified in the architecture manuals, and worse, behavior of the actual parts does not always reliably agree with the manuals. A good paper: http://www.cl.cam.ac.uk/~pes20/weakmemory/cacm.pdf
Thanks for the lwn and cambridge links, they both look like a good read. Considering the big picture I feel like we need another Tony Hoare to show us good ways to reason about shared memory concurrent programs. CSP is cool, but at some point Erlang/Go message passing style is not enough, and one must dive deep into the crazy world of parallel processor architectures. It is my opinion that before somebody sorts this out well we should not expect processor manufacturers to play along - after all hardly anybody understands shared memory concurrency well.