Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

You possibly need some background concepts.

Compilers work by taking a (computer-)linguistic analysis of the source code text, a syntax tree, converting it into an intermediate representation (IR) which is then progressively lowered to the raw machine code form by a series of transformations.

A connected graph of basic blocks (BBs) is generally how such IRs work. Basic blocks have a formal structure involving them being connected to other basic blocks only at their tops and bottoms, and not being multiply-connected internally. A perhaps simplistic way to visualize them is as the polygons on a flowchart.

So the article, which assumes that you know this background, is showing what kinds of basic blocks in Clang's intermediate representation result, before any lowering and optimizations, from the syntax tree. It's skipping over the part where a syntax tree is constructed from the source code shown.

In many compilers, you don't get to see the intermediate representation. In compilers like OpenWatcom C/C++ and GCC you can always go and look at the source code to see what the IR data structures are. But the IR is in memory (or internal temporary files) and not accessible outwith the compiler programs themselves. Clang's design involves a publicly specified intermediate representation that has documented serialization formats for writing to file and then accessing with other, potentially third party, tools.

* https://llvm.org/



It's not as clean as clang/llvm, but you can see the intermediate steps of GCC with the -fdump-tree-alland -fdump-rtl-all options. The GIMPLE IR is not so hard to follow (not as hardcore as the gcc docs...). Can be interesting to see the effect of front- or back-end compiler options on your code and track down how an optimization work.

You can also write compiler plugins in GCC. Not as fun as clang/llvm, but manageable.


> A connected graph of basic blocks (BBs) is generally how such IRs work

I was wondering. When is this graph actually built? is it before creating the IR or after? The way I see it, you can generate IR without worrying about control flow graph. Then you can build the CFG from the IR in order to perform common optimization or register allocation. Is that correct?


In LLVM, a function is linked list of BBs, each of which is a linked list of instructions. The final instruction in each basic block is a special kind of instruction that includes a list of all possible successor BBs.

In effect, the IR is exactly a control-flow graph represented in adjacency list form, so it's not possible to construct the IR without constructing the control flow graph. You could theoretically write the IR in textual form, but that's definitely not the common case, and you generally need to construct the CFG anyways to properly emit the IR (particularly since adding the phi nodes for SSA requires knowing the predecessors of every basic block).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: