Biggest insight from the article: clang does not need to generate SSA with phi nodes (which can get complicated and subtle). It can simply generate a sort of "degenerate" SSA which avoids phi nodes by storing and loading results to memory. This way the front end can concentrate on converting the front end C/C++ code to something simple.
Many of memory/load stores will be removed later in the pipeline and changed to registers. That is something for the optimization pipeline. Thats when phi nodes etc. will be insert and come into play.
Thanks for the summary. This two-pass approach is the same one I used in Go’s golang.org/x/tools/go/ssa package, because I was at that time unaware that there are more efficient algorithms for constructing full SSA (with phis) in one pass, using the optimistic assumption that control flow graphs are reducible, which is nearly always true. The Go compiler’s SSA construction uses the one pass approach.
Very interesting to see that Clang basically always produces very bad and unoptimized LLVM IR code and leaves it to LLVM to clean it all up. That said, it's not entirely true that Clang avoid doing any optimizations -- it does indeed produce slightly different LLVM IR for -O0 and -O3.
I think this was done because while LLVM was built to be academic, Clang was built to be fast and beat GCC, and so ended up overly low level.
A nicer and newer approach is to use an intermediate bytecode that lowers to LLVM IR; Swift, GHC, and I'm sure several other examples I haven't thought of do this.
I'd like to see this for C because I think -O0 is a bad debugging experience, eg because system libraries are still optimized, and an interpreter running on a C-like bytecode would be better. Really the only thing -O0 is good for is working around compiler bugs.
Many of memory/load stores will be removed later in the pipeline and changed to registers. That is something for the optimization pipeline. Thats when phi nodes etc. will be insert and come into play.