Bootstrapping Rust

JonathonW · on Dec 11, 2014

How important is breaking the circular dependency for the compiler, really?

To use a more common language and tool as an example: I can't build GCC without already having a working C++ compiler. Granted, GCC will accept any reasonably conformant C++98 compiler, but still... If I'm bootstrapping GCC on Linux (or almost any other major *nix), that compiler's almost always going to be another copy of GCC.

If getting started with Rust on another platform is indeed so difficult, I'd think it would be a better use of their resources to make sure that cross-compilation is functional, rather than messing around with distributing LLVM IR and stuff. If I'm building a C/C++ build environment for a new platform, a cross-compile of my tools is probably how I'd start.

jensnockert · on Dec 11, 2014

Cross-compilation is functional, and it's how you would bootstrap it. But in theory you could also bootstrap from the old compiler written in ocaml, but that would take ages due to needing a gazillion builds of rustc.

kibwen · on Dec 11, 2014

We can determine exactly how many builds you'd need by looking at the master snapshot file: https://github.com/rust-lang/rust/blob/master/src/snapshots....

According to this, there have been 290 snapshots in total. And keep in mind that you would also need to rebuild LLVM quite a few times as well during this process, as Rust has continually upgraded its custom LLVM fork over the years.

pjmlp · on Dec 11, 2014

The P-Code used by some Pascal compilers never was intended to be a fully Pascal VM, rather a porting tool.

Niklaus Wirth idea was to have a portable assembly that he could use to easily bootstrap the compiler.

So porting to a new platform meant:

- setting the output format to P-Code

- compile the compiler

- write a P-Code interpreter without any attention to performance

- use it to run the compiler until the new native backend is done

This was specially important back in the day each OS had as option their own proprietary system programming language or Assembly.

Personally I like this option (using interpreters as porting tool) better than cross-compiling, as you can work directly on the target system, except in embedded platforms case.

aidenn0 · on Dec 11, 2014

"The dependency of the compiler on itself can be broken by distributing the bootstrap compiler as LLVM IR code. Then use LLVM's IR assembler, and other tools, to re-link the stage0 Rust compiler on the target platform."

LLVM IR is target architecture dependent, so it's not portable between machine types.

bostik · on Dec 11, 2014

It's not exactly simple to say either way.

For simple projects and not-too-complex code the IR technically can be cross-platform (although I never tried whether it worked between different endians); we got a number of early prototypes actually working. We compiled IR "object files" on x64 platform and managed to link them on ARM, after a few extra disassembly steps. They even ran. This was slightly more than 3 years ago.

However...

From my experience a more correct statement would be: "LLVM IR is mostly architecture independent." The moment you mix in extremely low-level code, such as atomics, portability of IR breaks down. The operations for atomic types are (or at least were) inlined as build-host assembly.

It was an interesting exercise nonetheless.

sanxiyn · on Dec 11, 2014

I find this surprising. The most common source of IR nonportability is from calling convention. Exactly same C function declaration would compile to different LLVM IR function declaration, and this has nothing to do with extremely low-level code.

For an example, look at http://llvm.org/devmtg/2014-10/Slides/Skip%20the%20FFI.pdf, from page 103 to 107.

Edit: There seems to be a misunderstanding. What I am saying is that "LLVM IR is mostly architecture indepedent" is false. LLVM IR is not even remotely architecture independent and nonportability happens with even the simplest codes.

apparentlymart · on Dec 11, 2014

From my (admittedly probably limited) experience with LLVM IR from writing my own compiler, it seems like if you were writing a program entirely in LLVM IR you could use a subset that could be compiled and run on any fully-supported LLVM target.

Of course that's a different proposition than e.g. compiling a C program to LLVM IR using Clang and then trying to compile that IR on a different target, or trying to interact with non-LLVM-IR functions that conform to the platform ABI.

Of course, the resulting native code may not be the best for the target, since e.g. a native integer on one platform might become a pair of smaller integers on another platform. But it could work.

With all of that said, I expect the proposal was to use the Rust compiler to compile the Rust compiler to IR, and I imagine Rust is complex enough that it must generate at least some target-specific IR. Perhaps one could take the generated IR and "normalize" it, but it's questionable whether it would be worth it.

sanxiyn · on Dec 12, 2014

rustc needs to interact with C ABI functions. To start, rustc needs to be able to call LLVM. As LLVM is not written in Rust, this is done with LLVM C API.

bostik · on Dec 11, 2014

Well, at least with Qt (which declares its own assembly atomics at compile time) this was a real issue. When going through the disassembled bitcode files we saw that all calls to q_atomic types went through inlined platform-dependent assembly.

Of course anything compiled on x64 failed to link on ARM.

Ygg2 · on Dec 11, 2014

That seems contrary to what IR is supposed to do and how LLLVM was designed. Any links or explanations?

Rusky · on Dec 11, 2014

LLVM IR isn't supposed to be target independent in any way. It's an internal representation for a compiler backend- it has to be able to represent target-specific semantics at some point.

rkangel · on Dec 11, 2014

Think of LLVM IR more like a communications protocol between front-end and back-end. A defined abstraction so that the parts can be cleanly separated.

rejschaap · on Dec 11, 2014

I managed to build the rust compiler with a little effort on FreeBSD. I thought I was home free, but when I tried to build Cargo it insisted on downloading and building the rust compiler itself (which fails, obviously). So I was more than a little dissapointed after going through the trouble of getting the compiler working and I gave up on cargo.

I'm hacking away happily with rust now though, I quite like it so far and am curious to see where it will go from here.

Argorak · on Dec 11, 2014

This is an unfortunate situation currently, but better then the alternative. cargo doesn't tie into the rust compiler (just uses it as a program it calls), but it wants to be always available. Because of that, it builds with a known-good version of Rust to make sure `cargo` can be used in the most recent version on all (supported) platforms. This is due to the instability of Rust.

Before that, cargo was always broken, if you didn't have the correct version for building it.

This will certainly change once Rust stable released.

As all things Rust, the whole ecosystem still has a huge label "in construction". The things that are supported and intended to be kept stable are quite stable, though!

kibwen · on Dec 11, 2014

As others have mentioned, Rust does indeed support cross-compilation (and has from day one). Sadly we still need to document it fully, though our spotty docs haven't stopped community members from getting Rust code running on FreeBSD and iOS and all sorts of ARM things. For the moment it may be enlightening to read the recently-implemented Flexible Target Specification RFC (https://github.com/rust-lang/rfcs/blob/master/text/0131-targ...), which you may have seen in action via Rust on the PSP (https://github.com/luqmana/rust-psp-hello).