Hacker News new | past | comments | ask | show | jobs | submit login
GCC Rust: GCC Front-End for Rust (github.com/rust-gcc)
312 points by varbhat on Jan 10, 2021 | hide | past | favorite | 173 comments



There was a thread posted on the rust forum a while back that laid out the goals of this project [0]:

> A friend of mine (Luke) has been talking about the need for a Rust frontend for GCC to allow Rust to replace C in more places, such as system software. To allow some types of safety-critical software to be written in Rust, the GCC frontend would need to be an independent implementation of Rust, since the relevant specs require multiple independent implementations (so just attaching GCC as a new backend for rustc wouldn't work).

Luke:

> The goal is for the GNU Compiler Collection to have a peer level front end to the gfortran frontend, gcc frontend, g++ frontend and all other frontends.

> The goal is definitely not to have a compiler written in rust that compiles rust code [edit: unless there is an acceptable bootstrap process, and the final compiler produces GNU assembly output that compiles with GNU gas]

> The goal is definitely not to have a hard critical dependence on LLVM.

> The goal is to have code added, written in c, to the GNU Compiler Collection, which may be found here https://gcc.gnu.org 20 such that developers who are used to gcc may compile rust programs and link them against object files using binutils ld.

> The primary reason why I raised this topic is because of an experiment permitting rust modules to be added to the linux kernel: https://lwn.net/Articles/797828

> What that effectively means if that takes off is that the GNU Compiler Collection, which would be incapable of compiling that code, would be relegated to a second class citizen for the purposes of compiling the largest software project on the planet: the linux kernel.

> Thus it is absolutely critical that GCC be capable not just of having rust capability but of having up to date rust capability.

0: https://users.rust-lang.org/t/call-for-help-implementing-an-...


It would be a grave error to have coded the Rust frontend to Gcc in C (or, as written above, "in c"). Gcc is now a C++ codebase, and new components should be coded in modern C++, for better productivity, performance, safety, and maintainability.

(You might prefer not to consider modern C++ more productive, performant, safe, and maintainable than C (and reflexively downvote), but the statement remains true: Gcc did transition to C++, for reasons. And, all the improvements that have kept Gcc competitive with Clang were done, since. And, Clang and LLVM are also coded in C++, also for reasons.)


I disagree with the idea of using C++. Even with some of its more modern features. If I was choosing a compiled language for system development right now I would choose C or Rust.

For starters C++ does not have a stable ABI. Rust plans too have a stable ABI. In either case right now both fall back to the C ABI, but at least its a goal of the rust devs.

C++ is going to be harder to boot strap than either Rust core or C when trying to build a compiler for a new system/platform. C probably is the easiest bootstrap of the 3.

Rust and C are generally more performant than C++ code. Also in the realm of low end micro controllers C code tends to have the smallest binary image size.

In C++ if you need to interact with other languages a lot times your stuck to the C abi which which forces you to avoid a lot C++ features/constructs. As for rust its not an object orientated language and its unsafe blocks generally make it easier to at least keep rust code idiomatic.

C++ is also a massive kitchen sink of a language with many ways to hide gotchas. Probably the worst thing in C is its null terminated strings but C++ also suffers from that. C is simple, and rust has at least so far made good design choices for the most part.

There also choices in the C++ language that are even part of the standard library that seemed like a good idea but with hind sight are not that great of idea. For example operator overloading.

Although we are talking a GCC front end so I would be surprised to see both C and C++.


Where did you ever get the idea that C and Rust are more performant than C++ code? Modern C++ produces the fastest code of any language in wide use, empirically, and I don’t think many people would argue that. People don’t use C++ because they love its elegance — who would? — they use it for its raw power and expressiveness.

C has not been used for applications where performance is a priority for many years now. C lacks the expressiveness to make many software optimizations practical. I’ve written database engines in both C and modern C++ and it is no contest, C++ is much more concise while producing faster code. People primarily still use C for portability.


> If I was choosing a compiled language for system development right now I would choose C or Rust.

I'm not quite sure what systems development has to do with compiler writing? (I assume by system development, you mean something like writing OS kernels etc?)

Eg there are good reasons to stay away from automatic memory management when you are writing an OS, you want control over that. But those reasons hardly apply when you run a simple user space program like a compiler.


System programs can be in user space for instance a run time for another language.

As for a compiler ideally its easy to port as its kinda of the bed rock to get any other system going. Although a compiler for some architecture or use cases is too large to be self hosted or does not make sense to be self hosted. So its not always a deal breaker for a compiler.


You seem to be confused about what C++ and C are, and even about what Gcc is and its relationship to programs it compiles.

C++ is a language defined by ISO committee SC22/WG21 via the Standards 14882, most recently C++20 (superseding C++17).

C is defined by SC22/WG14, in the same way.

The most widely used implementations of these Standards -- Gcc, Clang, and MSVC -- are in fact the same project in each case (although MS's [until recently implemented] a long-superceded C Standard), and are, in fact, themselves C++ programs.

Portability of compilers is not needed, in general, to "get a system going", because cross-compilation is a mature and long-supported technique. Gcc is very frequently ported anyway, for practical reasons, via cross-compilation. Being coded in C++, and capable of compiling and cross-compiling itself, Gcc remains quite portable, as is repeatedly demonstrated by notedly frequent ports.

It is hard to imagine what could be considered a "deal-breaker" in this context. By the evidence, portability of C++ is never a dealbreaker as implementation language for a compiler, as in fact the compilers actually used in "get[ting] any other system going" are, with only rare exceptions, compilers in fact coded in C++.

(Exceptions are found not to be about portability, but about the memory footprint of the resulting compiler, which is not a product of its source language, but rather of the power of the compiler's optimizer, which is not always seen as necessary.)

In any case, the implementation language used for the compiler has absolutely no effect on the stability of an ABI for any target language. ABI stability is a choice made by language designers to favor backward link-compatibility over convenient expression of new features or bug fixes. It is hard for me to imagine the level of confusion that would produce this error.


> although MS's implements a long-superceded C Standard).

Not anymore, as of 2020, MVSC supports C11 and C17, with the exception of the C99 features that were dropped in C11, like VLAs.

Previously they were only keeping up with the ISO C features required by ISO C++ compatibility requirements.


> The most widely used implementations of these Standards -- Gcc, Clang, and MSVC -- are in fact the same project in each case (although MS's [until recently implemented] a long-superceded C Standard), and are, in fact, themselves C++ programs.

Yes. Though as far as suitability for 'systems programming' is concerned they might as well be Python programs or bash scripts, as long as they produce suitable output in an adequate amount of time.


> System programs can be in user space for instance a run time for another language.

OK, I can live with that definition.

But what does it have to do with the language you write your compiler in?


gccrs is written in C++, so no worries there.


There is another project for implementing a GCC backend for Rust which instead uses the libgccjit API.

https://github.com/antoyo/rustc_codegen_gcc


Is a great deal of actual Rust behavior not fairly intimately tied to LLVM? As far as I know it leaks various LLVM details and much of the documentation about various functions documents them as being little more than a thin wrapper to various LLVM internals.


IMHO, "great deal" is overemphasizing it. There are some things, yes, but they are mostly smaller, more niche details that are even probably things we'd choose ourselves in many cases.

That being said, there's also cases where Rust does not have LLVM semantics, and that can cause bugs. Some famous examples being the loop optimization miscompilation, and &mut T currently not being marked noalias.


> Some famous examples being the loop optimization miscompilation, and &mut T currently not being marked noalias.

I don't think that's a case where Rust "doesn't have LLVM semantics" since the miscompilation was reproduced in standard C. Rather that rust is actively and ubiquitously leveraging otherwise rarely-exercised LLVM features, revealing a bunch of bugs (either leftovers or breakages) in them.


The latter, you're right, I did make a mistake here. I was thinking of it as "Rust doesn't just do whatever LLVM does, see, we have semantics and LLVM compiles them wrong, so this is an example of that" but I forgot that actually, we do specify "this follows what LLVM does" currently, and the miscompilation is just a plain bug. Extra embarrassing because IIRC I was the one who made the PR saying that.

The former though is an area of significant divergence between Rust and C++ semantics, and LLVM directly following C++ semantics.


> The former though is an area of significant divergence between Rust and C++ semantics, and LLVM blindly following C++ semantics.

Ah I must have missed that, I thought you were talking about only one thing (since IIRC the noalias miscompilation is due to loop unrolling?)


FWIW, `noalias`, aka. `__restrict__`, generally works fine but the inliner translates it into `noalias` metadata which is, still, ambiguous and also not handled properly in some cases. Loop unrolling is one of them but that will be fixed with https://reviews.llvm.org/D92887, a general problem is the ambiguity itself.

There is a major effort to revamp the `noalias`/`restrict` handling going on for a while now. It takes quite long because it is hard and complex and we want to get it right.

In case you are interested, here is the new design https://reviews.llvm.org/differential/changeset/?ref=2170825 here the overall code changes currently considered https://reviews.llvm.org/D69542 and here you can find information on our monthly LLVM Alias Analysis call https://docs.google.com/document/d/1ybwEKDVtIbhIhK50qYtwKsL5...


I don't know the root cause of the noalias miscompilation, the thing I was referring to is https://github.com/rust-lang/rust/issues/28728

TL;DR: C++ says that an infinite loop with no side effects is UB, Rust does not. Empty loops in Rust will disappear entirely when they should really loop forever.

(I edited my comment slightly because, re-reading, "blindly" sounds too negative.)


Great point! If anyone is curious about how to write an “empty” loop that is not compiled away, I've seen this common implementation that doesn’t even need the standard library:

loop {

core::sync::atomic::compiler_fence( core::sync::atomic::Ordering::SeqCst);

}


Since https://reviews.llvm.org/D86841, ~ Aug 2020, Clang is able to reason about the forward progress guarantees of the input. For the LLVM-IR handling see https://reviews.llvm.org/D86233. While we are still adding more fine-grained support of different language standards and corner cases, e.g., constant loop condition in C vs. C++, you should be able to tell LLVM a loop is allowed to loop forever w/o side-effect. In fact, that is the new default. That said, LLVM still removes calls to functions without side effect unconditionally. We are working on that, i.a., https://reviews.llvm.org/D94106.


Notice that in C and C++ loops that might not terminate are UB, while in Rust they are just infinite loops.

So there is a significant difference in semantics between Rust and C/C++ here.


C and C++ actually differ in semantics here; C allows infinite loops that are controlled by a constant expression.


Generally, yes. Though, as always, these things are hard to summarize in one sentence and depend on the standard version (pre/post C++11). This post might shed some light on the options we are considering for Clang right now: https://reviews.llvm.org/D94367#2489090


Just what is gained by failing to loop forever? It seems like a really low-value optimization. Nearly always, the optimization would violate the programmer's intent or it would change an ordinary bug into a confusing bug. An infinite loop is not very many bytes on any processor.

The standard may allow the generation of insane code, but a decent-quality compiler does not do so. The standard ought to be fixed.


The infinite loop being UB was added because it prevents some important code motion optimizations and makes it hard to reason about the memory model. You can find the details on the papers leading to the C++11 memory model.


> The infinite loop being UB was added because it prevents some important code motion optimizations

Which ones?

If this optimizations are so important, how come Rust was designed in such a way to make them impossible? Also, how does this fit, e.g., the benchmark game results which show that Rust is faster than C for all benchmarks considered there ?


Store sinking for example, which is unsafe unless the loop is guaranteed to terminate.

C does not have this guarantee (at least not in all cases). Also rust is compiled with the llvm backend, so my understanding is that in practice rust assumes that loops terminate. See:

https://github.com/rust-lang/rust/issues/28728

There are llvm directives that can be added to prevent the optimization, but they are rejected by the rust maintainers exactly because they would cause performance regressions.


Uh, so what?

Letting the compiler assume that a "switch" without a default will never go there is a great optimization too. Why not put that in the standard?

Actually, this is worse. Letting the loop be UB is like letting a true "if" be UB. Just never mind the code actually written; surely it doesn't mean what it says.


> Letting the compiler assume that a "switch" without a default will never go there is a great optimization too. Why not put that in the standard?

that's actually the case already. A missing default is UB if the switch condition does not match any of the cases. And C compilers already optimize accordingly.


Loops that do nothing and don't terminate.

If your loop infinitely writes the number 70 to an atomic int, it's fine.


If by "nothing" you mean "nothing but looping", then yes, you are right. Any loop that "just loops" is doing something by definition: looping.


It depends on your perspective. Normally when I talk about what a loop "does", I am talking purely about the contents of the loop.


Potentially but there are alternative backends like cranelift so it's unlikely anything major is tied to just LLVM anymore


I'd find that very surprising and interesting - I can't imagine which semantics could leak from LLVM into a language specification. Can you give some examples?


For instance `std::ptr::offset` which is apparently a trivial wrapper around some LLVM internal: https://llvm.org/docs/LangRef.html#getelementptr-instruction

I'm not sure as to what capacity GCC has a similar instruction but I heard that LLVM's using of signed integers here is apparently nonstandard and what lead to Rust's decisions for vectors to be limited to a certain size: https://doc.rust-lang.org/nomicon/vec-alloc.html


I don’t know why you’re italicizing LLVM, gcc, and Rust, but I (and perhaps others) find it relatively jarring as my brain automatically parses it as emphasis even though it clearly isn’t intended to be.

I only bring it up because it pretty drastically harms readability (for me at least, and to a degree I frankly find surprising). Just thought you may want to know.


This is common nomenclature to enhance readability in fields that have a large number of neologisms and proper nouns, such as some technical domains, medicine, investment banking, and so on. It's particularly useful when names are in the same language as the base language (e.g. in English "Peter" is easily recognized as a name, but "better fish" means something in the language, but can also be used as a proper noun, so using italics for proper nouns serves to disambiguate the intended meaning).


Maybe he is used to writing in MLA style which calls for the titles of software to be italicized just as books, movies, etc are.


This is indeed a point of contention among different styles of whether software titles should be italicized. Though the position that they should uniformly be is the minority one, it is certainly used by some outlets with considerable weight, and I believe it to be the consistent one if titles of books and films are too.

Some style guides make the even more inconsistent distinction that only titles of video games be italicized, but titles of “application software” not be.

I suppose that on H.N., where such software is frequently mentioned, it does stand out more.


The video games example seems consistent to me; we regularly italicize the names of plays (for example) but not the names of tools and businesses.


Because tools and businesses are not titles that qualify for intellectual property, a piece of software is.

I do not italicize Google the company either, but I do that of Google the search engine.

The titles of scientific research papers are also typically italicized.


Looking at their commenting history, they have a habit of italicising really often, so it's not specific to this topic. It does indeed make a lot of their comments annoying to read without obviously adding any value.


> It does indeed make a lot of their comments annoying to read without obviously adding any value

I’m going to assume that you mean that the italics don’t add value, and not their comments as a whole. I think your comment could be read either way. Anyways, even so others have since pointed out reasons why the might be used to doing so.


getelementptr is just some pointer arithmetic - all backends can do that.

And using signed integers is unusual yes, but again all backends (obviously) also support signed integers.


IMHO, "Just some pointer arithmetic" is selling it a bit short. There's a lot of stuff there around provenance that matters, that is, there's a reason this is an intrinsic and not just a subtraction after casting both to an int.

(I agree on the signed integer thing though.)


Glibc malloc also limits the maximum according to ptrdiff_t, which is effectively the same as Rust isize.


I don't think much of the stable functionality exposes llvm internals. Part of stabilizing is removing the llvm leaks into the language.


This is very cool, maybe this could push the rust community to have a formal specification and a stable ABI in the future.


The work on a spec is already going as fast as it reasonably can be.

A stable ABI is unclear.


Glad you're going slow on the stable ABI and resisting the pressure to put out something half-baked. The C++ ABI is horribly fragile and complex. Unless the pitfalls of C++ can be avoided, making no ABI promises is better.


No stable ABI makes distributing shared libraries harder :/


If you want to distributed stable shared libraries, you write to the C ABI, not the C++ ABI.

Likewise Rust users should write to the C ABI for stabled shared libraries. (I believe there are a bunch of mechanisms to do that, though I'm not really a Rust user)

For example look at what KDE does: https://community.kde.org/Policies/Binary_Compatibility_Issu...

Yes, you have to do a lot of extra work. That's working as intended.

It's sort of an oxymoron to expect to use every C++ or Rust feature in your user-facing ABI and have it be stable. They are incredibly rich languages, with drastically different notions of "function" than C has (let alone other constructs like data layout)


There's also the possibility of an opt-in stable ABI (using the `repr` mechanism). I personally really like the idea of there being an opt-in cross-language ABI at a higher level (or supporting more use cases) than the C ABI (which is quite limiting).


For example, COM/UWP on Windows, Binder on Android.


(There are mechanisms to do this, yes)


I don't understand what is your point. The KDE example you have linked shows that you can in fact guarantee a stable C++ ABI (although it requires a lot of care). The fact that they go through all the trouble is a hint that a C ABI is in fact too restrictive and not expressive enough for a large framework like KDE.

Specifically KDE relies (among other things) on the ABI stability of the layout of virtual tables which is certainly not part of the C ABI.


That's fair, there is a middle ground between C and C++. My point is that you can't expect the compiler to do everything for you in terms of creating a stable ABI for shared libraries. I think a lot of people are under the impression that this is purely a compiler feature.

What I expect to happen is that rather than "Rust gets a stable ABI" it will be "Rust very gradually builds upon the C ABI for selected features".

Most C++ libraries have templates in their signatures these days for efficiency, and Rust has a similar flavor (monomorphization). I think there is a fundamental tradeoff there between stability and performance, and both languages have a heavy emphasis on the latter. I could be missing something as I certainly don't know all the details of the C++ ABI.


Even with templates you can still keep the ABI stable, se for example libstdc++. Which is not surprising as they compile down to the equivalent C code.


I think your example is better evidence for my view. I remember this issue being mentioned in a CppCon talk: C++ 11 broke the ABI for STL (for efficiency as far as I remember), and the binary interface to libstdc++ changed as a result:

https://gcc.gnu.org/onlinedocs/libstdc++/manual/using_dual_a...

So the ABI can leak implementation details in nontrivial ways that library authors usually don't consider. It's better to have something explicit in the code, e.g. under extern "C".

The point is not that making a stable API for every language feature is impossible; just that it's hard, fragile, and maybe not be worth the effort. If you really want stability, then use fewer features more like C. There is probably some middle ground that's richer, but templates are known to cause problems.

Also see the release history here:

https://gcc.gnu.org/onlinedocs/libstdc++/manual/abi.html


The ABI break in libstdc++ has nothing to do with templates and it was not done by mistake.

The layout of std::string changed to conform to the new standard; exactly the same thing would have happened if std::string was a C POD type, there is no way around that.

In fact libstdc++ to this day still has the option to be compiled with the old ABI.

There have been other minor ABI breaks, mostly to fix bugs, which affect only a small number of programs.

Pure C libraries have exactly the same ABI stability issues caused by changes in layout of structures; to avoid this C libraries expose only opaque pointer sized handles to heap allocated objects. This is similar to C++ libraries only exposing pointers to virtual interfaces, but for high performance libraries, for example containers, this is not considered acceptable.


The point is that data layout is an implementation detail not part of the C++ API, and exposing that through shared libraries isn't a great idea. People do it, but it's a big mess and not particularly well documented.

You can write conformant STL implementations that have completely different layouts. But the GNU one no longer had a valid layout, so it had to change.

So when you expose libstdc++ as a shared library, you're exposing a bunch of details that aren't part of the C++ standard.

If you write it in a header, then you can expect it to break callers via shared library. And templates must be in headers. This is a fundamental language issue.

I don't use Rust, but the point is "making a stable ABI" will expose it to these sorts of problems, i.e. implementation details leaking on specific architectures, outside of the language.

The projects that care about ABI stability, e.g. sqlite, don't expose layouts. They have only functions and not data in their headers.


This thread is getting confusing.

Your original point was that for binary compatibility you write against the C ABI. I'm claiming that the C ABI has exactly the same fragile layout limitations and you can use the exact same workarounds in C++ (i.e. only expose pointers or make sure that your layout doesn't change).

I'm also claiming that templates are a red herring and have very little to do with ABI.


Yes you should do that now but only because Rust doesn't have a stable ABI. If/when it does then there's nothing wrong with distributing shared libraries using the Rust ABI, just like there isn't really anything wrong with doing it in C++ nowadays as long as you don't use bleeding edge C++ stuff. The C++ ABIs are pretty stable.


there isn't really anything wrong with doing it in C++ nowadays as long as you don't use bleeding edge C++ stuff

That's the whole point of the KDE doc I linked. It's not a reasonable strategy to use unrestricted C++, just like it's not a reasonable strategy to use unrestricted Rust. You actually have to design an ABI, not just rely on the compiler to do it for you.

I thought there was a GNOME doc too, but I couldn't find it. The point remains: there are lots of things in C++ that people who care about stability don't use at their ABI boundaries.


Everything is tradeoffs. Stable ABI can also lead to other issues, like performance problems. C++ is dealing with some of these right now, and there are some situations (very very micro benchmarks, to be clear) where Rust is faster than C++ due to ABI issues.


That affects mostly GCC and clang, other compiler vendors are more happy to break ABI between major releases, as long as they aren't forbidden by ISO C++ specification.

I was already sharing templates and classes across DLLs in Windows 3.x compilers, and keep doing it with VC++ to this day.

On Linux with Qt and Gtkmm projects, and on macOS/iOS with their system frameworks.

Which is the biggest reason I cannot put up with cargo's model to compile every single project from scratch, after git clone.

As we have lengthy discussed, while it might not be a priority right now, it is certainly an adoption block among some Ada, Delphi, Swift, Objective-C, C and C++ communities.


Does it not already have them if the programmer elect for it by choosing `repr(C)`, or is there more to it than that?


You are exposing an imprecision in the way that this is usually talked about, yes. If you are willing to use the C ABI, then you can use a combination of that repr and annotations on your functions and produce a shared object in Rust with a stable ABI.

What people usually mean here is that Rust would have its own stable ABI, that you would get “for free,” without needing to do that work. (It’s never actually free of course... but that’s yet another detail that is usually papered over when people talk about this.)


I suspect it's too late, but one lesson that can be learned from D is that having one frontend implementation with multiple backend glue layers is much more convenient than having one in D and one in C++ (I believe Iain Buclaw is in the process of moving the D frontend in gcc to use the proper D one as it's been lagging behind)


There are pros to that approach, but also cons. The major con is that keeping the rustc frontend would make an existing Rust compiler be a requirement, and not having that makes bootstrapping easier, which is a major pro.

This (among other things) was debated a lot.


> The major con

A major con of not sharing is not having a backend at all. It's a lot of work to keep up with a moving target


Yes, that is a theoretical problem, for sure. All choices have pros and cons.


The timeline looks rather ambitious https://github.com/Rust-GCC/gccrs/milestones


It also looks rather incomplete. Are they not planning on implementing borrowck? If chalk/polonius were ready and had a C API the roadmap would begin to make sense.


Makes sense to me. As mrustc[0] mentions, implementing validation in a secondary compiler is much less important, because you can always just run the reference implementation as a glorified linter in the meantime.

[0]: https://github.com/thepowersgang/mrustc


That's what I don't get. I looked through the available documentation briefly and didn't see any mention that this is intended to be a `mrustc`; it really seems to want to be a `rustc`. I'm not aware of other GCC frontends being less than complete compilers for their respective languages, and while I think that "rust without borrowck" is an interesting point in design space (discriminated unions, generics, macros, traits, closures), "rust without borrowck" is not rust.


You can plug in the borrow checker after writing the base compiler, but you if you block waiting for the borrow checker you get nothing done.


I'm not sure if what I'm asking makes sense, but since it's written for a new backend, would the authors have to bootstrap it using a different toolchain? I guess what I'm asking is, could they use the LLVM Rust to build the GCC frontend or do they have to start all over with a different base language to get a first working version of a rust compiler?


As far as I can tell, this is written in C++ like the rest of GCC, so they don't need a rust compiler to bootstrap.


Looking forward to the day when gcc gains a C++ frontend written in Rust, to enable bootstraping gcc from a rust compiler :)


I see, so it’s written in C++. Would it remain that way, though? AFAIK the LLVM Rust is, itself, written in Rust, right? I imagine that it would be a goal to do the same for gcc.


There is absolutely no reason why the GCC front-end needs to be written in Rust. The reason the LLVM front-end was written in Rust initially was so they could immediately test and use new features in what was at the time also the largest program in Rust. Re-writing the GCC front-end in Rust would just prolong an already rather unfortunate bootstrap problem with the language and it should be strongly discouraged.

As it stands, the way to bootstrap the official Rust compiler from source with just a C/C++ compiler is a few options:

* Compile OCaml (implemented in C), use it to build the original Rust front-end in OCaml and then build each successive version of the language until you hit 1.49. This option is not fun.

* Compile mrustc, which is a C++ implemented compiler that supports Rust 1.29. Use that to build the actual Rust 1.29 and then iterate building your way all the way to 1.49. That is less bad, but still not fun.

* Compile the 1.49 compiler to WASM and run it via a C/C++ implemented runtime in order to compile itself on the target system. This would also mean packaging and distributing the WASM generated code, which some distributions would refuse. I also am not sure if it's even currently feasible, as I don't follow the WASM situation closely.

A compliant, independent C++ implementation that could be built in the ten minutes it takes to build GCC itself would be a very good thing to have and would be more friendly to distribution maintainers.


I don't see why a wasm blob would be any more palatable to maintainers than an executable.


Like I mentioned in my post, it wouldn't be for some depending how serious they take things. I'm well aware many would refuse generated artifacts, even compiled to source artifacts. For others not so strict, they may accept it since the WASM blob would be the same across targets, unlike an executable, so it lessens a burden as maintainers only have to generate it once for their distribution. It was never something I suggested the Rust maintainers provide a blob for, either.

Regardless, it was worth mentioning as a potential option. I am one of the handful of maintainers for an experimental distribution where packages are either compiled or interpreted from tarballs and this would be something we'd consider. I'd MUCH rather have the GCC front-end option, however. So far, we've simply not packaged Rust and have accepted that as dead-ending our Firefox package. This may potentially revive it.


> There is absolutely no reason why the GCC front-end needs to be written in Rust.

How about memory safety and fearless concurrency?


Not a requirement for a GCC front-end and certainly not worth sacrificing a potentially faster path to bootstrapping the official compiler implementation. You should be worried about the ease by which various systems can bootstrap and adopt the language, which is a mostly solved problem for C/C++ but not a given for Rust itself. Some maintainers will absolutely refuse bootstrapping off of binary artifacts compiled from other systems, others won't even accept 'compiled to C' artifacts.

Keeping a viable C++ implementation as part of GCC would be the smartest decision.


The Bootstrappable builds folks dislike binary artifacts so much they are implementing bootstrapping a full Linux system from only 512 bytes of machine code plus all the source code:

https://bootstrappable.org/ https://bootstrapping.miraheze.org/wiki/Main_Page


Who are these groups who are demanding such easy bootstrapping? OS or distro developers? Programmers working on embedded and/or safety critical systems?

I know OpenBSD avoids rust because of the bootstrapping issue, but they also avoid LLVM because of a licensing issue.


OpenBSD uses clang/llvm on most arches by now for the system compiler.


Uf. Brain fart. I meant that they avoid GCC because of licensing.


bootstrapping is important, but I believe that GCC already allows non-primary (i.e. optional) languages frontends to be written in other languages. The ADA front end is written in ADA for example.


A compiler front-end is pretty boring software in terms of memory safety. Lots of things that the front-end allocates are simply never freed.

Have you ever seen GCC crash with a SIGSEGV? I rarely did even when I used to be a GCC developer.


Indeed. Notoriously, a SEGV in gcc during compilation used to usually mean "your hardware is flaky": https://tldp.org/FAQ/sig11/html/index.html


That was before the invention of fuzzing.


There are still out-of-bounds concerns and chasing through NULL pointers. (Not arguing against gcc's quality or stability, just listing memory safety concerns that transcend memory deallocation)


I guess you could rewrite it in a language with no implementation at all if you are worried about purity.


No one is saying that rustc should be rewritten in C. They are saying that an alternative compiler front-end in an alternative language is sensible.


No one is saying that someone said rustc should be rewritten.


It could take the same approach as GDC and GNAT, by having the frontend written on the same language (D and Ada respectively), shared with other implementations using another set of backends.


The LLVM-based Rust compiler uses a lot of unstable/nightly-only Rust features internally. So even if this project got to the point where it could compile all stable Rust programs, I think it would take quite a bit more work than that to be able to compile `rustc` itself. (It might be that the unstable stuff is mostly in the standard library and not the compiler itself? Does it make a difference?)


> It might be that the unstable stuff is mostly in the standard library and not the compiler itself? Does it make a difference?

It might. There are at least two major things off the top of my head, regarding libstd:

1. specialization is needed for performance around String and &str

2. const generics are needed to support some trait implementations

We currently allow some stuff like this to leak through, in a sense, when we're sure that we're actually going to be making things stable someday. An alternative compiler could patch out 1, and accept slower code, but 2 would require way more effort.

There has been some discussion about trying to remove unstable features from the compiler itself, specifically to make it easier to contribute to, but it unlikely that it will be completely removed from the current implementation of libstd for some time.


Doesn't min_const_generics (stable once 1.51 releases on 2021-03-25) cover everything std needs (implementing traits for arbitrary array sizes)?


I believe that it does, yes. We haven't hit that in stable yet, though, so I am speaking purely in the present. You're right that it's looking good for stabilization in a few months, but anything can happen, in theory. I'm more trying to illustrate the point, that significant features can depend on something that's unstable currently. Those impls landed well before it was even marked as being stable in nightly.


Isn‘t mrustc already able to compile rustc?


Yes.


Oh that's impressive. I didn't know it was that far along.


It managed to do it for the first time Dec 24 2017 https://www.reddit.com/r/rust/comments/7lu6di/mrustc_alterna...

> It's managed to build rustc from a source tarball, and use that rustc as stage0 for a full bootstrap pass. Even better, from my two full attempts, the resultant stage3 files have been binary identical to the same source archive built with the downloaded stage0.


If it was written in rust, there's no reason they couldn't use LLVM rust to start development until it became self-hosting. (Same as you can develop a self-hosting C compiler by starting with gcc)


It's not written in rust; gcc is written in c. There is no bootstrapping involved.


GCC is no longer written in C, but in C++. They switched after GCC 4.7.


Of course, much of the code is still C.


Do gcc languages - even low level ones - remain written in c?


The GCC Ada frontend is written in Ada:

https://gcc.gnu.org/wiki/GNAT

There is also a port of the Ada frontend to LLVM backend:

https://github.com/AdaCore/gnat-llvm


Besides the Ada example, GDC shares the D frontend, written in D, with other D implementations.


C++ but yes.


This is excellent; there's embedded targets that LLVM doesn't officially support and GCC does.


GCC is straight up faster in quite a few benchmarks (it's basically neck and neck in most benchmarks, and you should try both if it matters).

GCC is also roughly even in compilation speed now https://www.phoronix.com/scan.php?page=news_item&px=GCC-Fast...

LLVM is much easier to work with internally but GCC is a seriously good compiler even now.


They missed the opprtunity to name it as grust.


Gust of wind


grust, not gust


I know. What I mean is that gust of wind would be even better.


A bit off topic, I hope someday GCC's build system gets overhauled. A huge advantage of LLVM is that it is quite easier to rebuild the runtime libraries without rebuilding the compiler. With GCC that's a pain, unless one takes the time to re-package GCC very carefully like https://github.com/richfelker/musl-cross-make and https://exherbo.org/.

Maybe getting some new GCC devs in there with projects like this would help with that?


Sorry for the nitpick, but I find the title rather confusing; shouldn't it rather be "Rust front-end for GCC"?


You could make the argument that, that would be a frontend written in rust.


Don't try to compile this with `make -j` - I tried, and my system ran out of ram and swap and started OOM killing things. I have 16 threads and 32gb of ram.

Running `make -j4` seems safe thus far.


did you try make -j16 since you have 16 cores? -j with no number means spawn as many parallel jobs as possible which could be hundreds or thousands depending on the project, I learned this the hard way a while ago myself when I used $(nproc) incorrectly on a project, got the same OOM/swapping death spiral!


I incorrectly assumed that `-j` and `-j$(nproc)` would do the same thing. TIL!


Ecstatic to see this. Once this stabilizes then I can switch my shop to rust and not look back. If I find some spare time I will absolutely try to find a way to contribute.


Do you mean that you will use this, or just have the alt implementation as a checkbox item? Because - I'm just guessing of course - for this to be a mature alternative compiler we might be looking at 5, 10 years in the future, or never. Just being realistic, things take time to grow (and it's also uncertain how they can ever be able to keep pace with the rapidly developing Rust project).

With all this said, I would love for them to succeed, for multiple reasons. Including <3 GPL.


I mean that I'd use it. If it takes a while then maybe I can get by with mrustc for a bit (haven't tried yet) but the net of it is that I need to support systems which llvm does not, which has kept me from using rust in product thus far.


Does this mean that compilation target currently supported by GCC would now allow rust to target that architecture? Specifically I'm thinking of PPC cores with the VLE extension, which LLVM does not support (as far as I'm aware).


llvm does support ppc64le cores with VSX extension; if thats what you mean

in fact, IBM uses LLVM proper as its own compiler backend for its ppc processors


The VLE I'm talking about is this: https://www.st.com/resource/en/user_manual/cd00161395-variab...

This may be a vendor-specific extension though?


from the readme :

> The developers of the project are keen “Rustaceans” with a desire to give back to the Rust community and to learn what GCC is capable of when it comes to a modern language.

So what's the answer right now ? How does GCC measures "against" rust ?


Given the amount of existing frontends for GCC, even if not included in the main branch, not sure what they imply as modern.

Ada, D, Go, Modula-3, Modula-2, C++20


RIP gcj.


Yeah, sadly. The project never had much developers, because tackling AOT compilation for Java, alongside its dynamism, is an effort that requires a full time job, so only the commercial offerings like Excelsior JET were competitive.


Front-end–independent optimizations are still slightly better in GCC than in LLVM.


I wouldn't say 'better'. Certainly, they're different. GCC does better in some ways, but worse in others. Somewhat notably, GCC doesn't do as good of a job at register allocation on some RISC architectures.


This is the very first step to getting Rust into Linux kernel. I liked that.


It is not required for getting Rust into Linux.


What I really hope is that the Rust community doesn’t go out of its way to make this easier. Communicate and let value come back but there’s an significant amount of value in keeping a single backend relies on. CPython has done the Python community a lot of good by keeping one official compiler/tool chain (despite the great work done by projects like JPython/PyPy).

The only way to do this properly, if desirable, is to make GCC an official backend of the main frontend. That will defocus some progress that happens with LLVM (every feature has to be implemented on both backends) and can make dev lives hard (eg “oh this problem comes up with GCC so use the LLVM backend “). The value would be if the majority of bugs/features are in the shared frontend.

This project though seems like a parallel implementation of Rust. That’s valuable for the community and inevitable as a part of successful growth. I don’t believe it’s beneficial to the community though if this grows beyond a toy, niche project.


>> The only way to do this properly, if desirable, is to make GCC an official backend of the main frontend. That will defocus some progress that happens with LLVM (every feature has to be implemented on both backends) and can make dev lives hard (eg “oh this problem comes up with GCC so use the LLVM backend “)

No. The correct way is to create a Rust language specification that describes what the correct behavior is.

Then whether LLVM, GCC, or something else is used does not matter. There won't be one implementation with defacto behavior, there will be multiple implementations that follow the spec.

This is the way mature languages work.


It doesn't even have to stop progress. C++ does a new rev of the standard every 3 years. Rust has editions, which is a similar but less rigorous concept, and which could be made more rigorous.


Exactly. The C++ standard (https://isocpp.org/std/the-standard) is the specification that describes what a compiler must do to implement a particular "version" of C++ (for example, C++ 20).

Just as there is a C++ standard and multiple C++ compilers that implement the standard, there should be a Rust standard and multiple implementations.

This is the way that mature languages work.


This is the way that very few languages work. Python is a great example of an extremely popular, mature language that does not work this way. It is unclear that most people think that this sort of process is required for “maturity.”

If this is the benchmark, then among popular languages you basically have C, C++, C#, JavaScript, and... is that it?


The other languages without a spec don't market themselves a C/C++ replacement or as systems languages.

Having a webapp depend on a CPython implementation detail is very different from having a kernel depend on an implementation detail of a language without a spec that was used to implement it.

And languages actually stack on top of each other. Imagine depending on a CPython implementation detail that depends on an implementation detail of a specific C compiler. Those things do happen and they make programmers' lives miserable sometimes, but imagine how often that would happen if C didn't have a spec that all compilers strive to implement.


Linux uses GCC extensions. So the Linux kernel depends on the implementation of a compiler already.


Those are intentional deviations from the standard, and I think there's a reasonable assumption that those stay the same/compatible from version to version. I think the other kind are a bit more problematic, and without a standard it can be difficult to tell whether your assumptions are reasonable.


Sure, and people in that space do care more. But it's clearly not a blocker, even for the stuff you're talking about. It's not a hard requirement to get Rust code into Linux, for example. (Not to mention what my sibling talks about, which is very much already true.)


Java, https://docs.oracle.com/javase/specs/index.html

Ada, Fortran, Cobol as well.

This is specially relevant in the industrial sector with certified compilers.


I wouldn't say it's a benchmark, but surely Pascal, Fortran and COBOL (to name but three) are all standardised?


There are more languages that have standards, sure, but the languages you mention, at this point, and in general, are mostly relegated to historical maintenance and maybe some very specific niches. To me personally, a “mature” language that’s not really being used isn’t the goal. And if developers truly believed standardization was valuable, you’d expect this property to give them a significant leg up against the sorts of languages like Ruby, Python, PHP, Perl, TypeScript, etc.

That being said, I do think, considering it longer, there are languages that I am missing, like SQL, and ones that have a spec, even if it’s not under an ECMA/ISO process, like Java and Go, that I was forgetting.

I still think that using this as a necessary condition for “maturity” is misguided.


Well, I agree that standardisation and "maturity" (whatever that may mean) are not connected, but I think most C and C++ programmers (to name two very widely used languages that I personally have a lot of knowledge of) find the respective standards for those languages very helpful, and somewhat lament the lack of formal standards for other languages they may use.


The C and C++ specs are full of UB though, more like a "minimum common denominator" for compiler devs than a strict spec that users can rely on. It's way too easy to write spec-abiding code that behaves differently with different compilers/architectures/optimizations.

Specs are definitely a good thing to have, but often they're just brandished as a bullet point without looking at the details: how good is the spec, what does it add over the existing tests/CI/RFCs/proofs, etc.


Yes. I do think that standardization is generally good. I am pro Rust getting a specification! I just think it is one pro among many, and not something required for success.


In a past life I worked on a COBOL compiler for IBM.

There is a COBOL specification, but AFAIK nobody actually implements it fully. Implementations pick and choose new features based on customer demand.

Also, COBOL is dominated by large legacy codebases, which means that if there's a discrepancy between an implementation and the specification, the users normally don't want it fixed, because they may have written code that depends on the "incorrect" behaviour and it's a lot of work to audit.

IBM built a new backend for its COBOL compiler using the optimizer from their JVM. IIRC by the time I left it generated code that was ~2x faster than the old compiler, but uptake was still slow because of migration concerns. In particular, we spent a lot of time working on features to help guarantee that code with some forms of undefined behaviour would have the same result as the old version.


Such standardized specifications are typically a response to many divergent implementations existing and a need to standardize a common ground between them.

Few languages find such specification before that time.

Python also lacks it despite some competing implementations, since none actually diverge enough from CPython.


PyPy famously diverges from it a lot (namely for native modules).

Standardization is helpful for tool building and experiments (ie here are the invariants we’ll never change). Languages don’t work that way and seeing how C/C++ have evolved (or really failed to do so at a meaningful pace), I’m under the impression (clearly unpopular due to the downvotes) that standardization and multiple competing tool chains are the cause of a lot of unnecessary complexity (not just within the language but also users of said language).


Having multiple independent implementations (and a spec) is key to being a serious portable systems language, a claim that Rust already makes. I believe it's too early for that, but sincerely wish them success achieving those goals in the future and having multiple implementations is certainly going to help here.

> eg “oh this problem comes up with GCC so use the LLVM backend “

The point of having multiple implementations is making the language independent of the underlying system. A programming language is an abstraction. The only way to test whether an abstraction is a good one is trying out how well it abstracts away various underlying systems. That's why Rust needs ports to various architectures, OSes and compiler backends. Having GCC as a backend counts double here, because you get a few new target architectures for free with the port not just a new backend.

It's of course much different with Python but I can still clearly see how having an implementation defining a standard hurts the language.


Strongly disagree. It will only benefit the language to have more than one quality implementation. C++ has benefited hugely by the competition between g++ and clang; both compilers have gotten much, much better. To be fair, it will take a while before the GCC Rust front end is competitive, but for some purposes it doesn't have to be, like bootstrapping.

If "progress" means "rapidly add more and more new features in each release", multiple implementations will slow things down. But that problem can be addressed with editions: at some point, if the project is a success, the gcc front end will be a feature-complete version of some Rust edition, plus enough extra features to build an older version of the Rust compiler. At that point, you have a better solution to the bootstrapping problem (how to get a Rust compiler when you only have a C compiler and you want to build everything from source and not trust some binary you download from somewhere).


> It will only benefit the language to have more than one quality implementation.

As long as "quality implementation" means "implements the entire Rust language and not a subset".

Because otherwise, people will start getting requests to avoid using features that the non-standard Rust toolchain doesn't support.


>If "progress" means "rapidly add more and more new features in each release", multiple implementations will slow things down

This has to happen at some point anyway otherwise we'll just get another C++. And I don't think Rust would benefit from that. New languages are designed to fix problems with the old ones, not to replicate them after all.

I just hope the designers will choose that point wisely.


In some extent, given the use of macros and Haskell like libraries, it is already another C++.

Besides, if Rust doesn't become another C++, it won't fulfil the industry needs that C++ caters for, thus while it might become a success in some domain, it won't replace C++ in the OS and GPGPU SDKs.


Rust will not in any case replace C++, in any core application area.

Rust might move in alongside C++, in some, in time. Or, Rust could still very possibly fizzle. That would be the normal course of events for a new language, barring a miracle as was dispensed to Javascript, Java, C++, and vanishingly few others.

Will Dart survive and thrive? Kotlin? Scala? Clojure? Go? All doubtful, based on prior experience. Having a lot of code and a lot of users does not seem to suffice. Many other languages had those, and faded. Ada even had $billions in backing, and faded.

What we can say confidently about Rust's future is that it is not certain to fade. The miracle has come in less deserving cases.


> Ada even had $billions in backing, and faded.

Yet, NVidia picked it up over Rust, go figure.


> otherwise we'll just get another C++.

I don't see why this is a bad thing. I'm a C++ developer, and I like C++. (I like rust as well)


> C++ has benefited hugely by the competition between g++ and clang; both compilers have gotten much, much better.

yet rust has a single implementation and by some metrics does better than both.


Compilation speed isn't one of them though.


Remember that Rust is supposed to be an alternative to C or C++, where the "this problem comes up with GCC so use the LLVM backend" is really rare and generally a sign of a bad codebase (exceptions being things like the Linux kernel that are so huge, optimized and domain-specific that they often end up relying on compiler dialects).

One exception to this is Visual Studio's toolchain but let's not talk about Visual Studio's toolchain on a weekend...


Yeah, when one forgets that there are more C and C++ compilers out there in the industry than just the pair that the FOSS community cares about.


The general sentiment of the community and leadership is that this project is a good thing, so you are unlikely to get your wish.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: