Wild – A fast linker for Linux

pzmarzly · 2025-01-24T16:55:32 1737737732

Ever since mold relicensed from AGPL to MIT (as part of mold 2.0 release), the worldwide need for making another fast linker has been greatly reduced, so I wasn't expecting a project like this to appear. And definitely wasn't expecting it to already be 2x faster than mold in some cases. Will keep an eye on this project to see how it evolves, best of luck to the author.

estebank · 2025-01-24T17:57:59 1737741479

Note that Mold has no interest in becoming incremental, so there is a big reason there for another linker to exist. I find it kind of embarrassing that MS' linker has been incremental by default for decades, yet there's no production ready incremental linker on Linux yet.

jcelerier · 2025-01-25T01:02:55 1737766975

OTOH even lld, fast but fairly slower than mold, is already incredibly faster than MS's linker even without the incrmeentality. Like, I'm routinely linking hundreds of megabytes in less than a second anyways, not sure incrementality is that much worth it

cyco130 · 2025-01-25T02:03:31 1737770611

Not a rhetorical question: Could it be that part of the speed difference is due to the file system speed? I was shocked when I saw how much modern(ish) Windows file systems were slower than modern(ish) Linux ones.

p_ing · 2025-01-25T17:08:16 1737824896

NTFS and ReFS are high performance file systems. But yes, it is due to file system filters, which OOTB means Windows Defender, though that can be extended by 3rd parties, including Sysinternals utils such as procmon.

Microsoft built a very extensible I/O stack and prior to Defender/prior to SSDs it really wasn't very noticable... back when it was originally designed well through the 90s and early 00s.

Unfortunately it is now noticable despite being an otherwise smart design. Which means Windows and/or NTFS are blamed as being slow, neither of which has any basis in fact when we look at the overall design of Windows' subsystems/VMM in comparison to macOS/Linux.

It sucks. You've got great plumbing in Windows with a shit shell on top.

rerdavies · 2025-01-25T22:51:05 1737845465

Disabling Defender on directories involved in compiling and linking speeds up compile and link times by a factor of 5x or 6x. It's not a subtle difference.

I agonize over the insanity of removing Defender protection on precisely the files that are most vulnerable on my computer each time I do it. But I do it anyway.

Interestingly, Android Studio offers to turn off Defender protection for development directories for you, and does so each time you load a project, if you haven't already done so. So I no longer feel like I'm alone in my insanity.

leeoniya · 2025-01-25T03:29:59 1737775799

it's usually windows defender and virus scanning that causes these massive slowdowns.

miki123211 · 2025-01-25T08:40:33 1737794433

And as a result of this, if you want to optimize your program to be fast on Windows, you will make very different optimization decisions than on other platforms.

For example, Windows runs virus scans on close(), which makes it very slow. This means that sometimes it makes sense to have one or more background threads exclusively dedicated to closing files.

There's a good talk on this at https://www.youtube.com/watch?v=qbKGw8MQ0i8

leeoniya · 2025-01-25T13:06:34 1737810394

might have been a problem in Bun that was fixed by: https://github.com/oven-sh/bun/pull/16747

berti · 2025-01-25T06:08:50 1737785330

That’s exactly the problem “Dev Drive” is intended to solve I believe. I haven’t tried it myself.

https://learn.microsoft.com/en-us/windows/dev-drive/

p_ing · 2025-01-25T13:39:45 1737812385

It helps. Install your games on it, too.

pjmlp · 2025-01-24T21:31:33 1737754293

Additionally the way precompiled headers are handled in Visual C++ and C++ Builder have always been much better than traditional UNIX compilers, and now we have modules as well.

zik · 2025-01-25T10:48:21 1737802101

The way precompiled headers work in C++ is a bit of an ugly hack. And worse, it's almost as slow as just compiling them all every time anyway.

pjmlp · 2025-01-25T11:07:20 1737803240

Only in traditional UNIX, which I mentioned.

Borland and Microsoft compilers have been dealing with them just fine since at very least 1994, when I started using Turbo C++ 3.1 for Windows.

paulddraper · 2025-01-24T21:51:40 1737755500

It has to be a candidate for the longest biggest gap in build tooling ever.

bogwog · 2025-01-24T19:25:09 1737746709

[flagged]

dang · 2025-01-24T20:22:03 1737750123

"Please respond to the strongest plausible interpretation of what someone says, not a weaker one that's easier to criticize. Assume good faith."

https://news.ycombinator.com/newsguidelines.html

estebank · 2025-01-24T19:52:12 1737748332

Yes, I missed a word. And I believe pretty much everybody else realized what I meant to say.

Feel free to point me in the direction of a production grade incremental compiler that can run on Linux, GNU or otherwise.

Thorrez · 2025-01-24T19:30:09 1737747009

I'm pretty sure that's a typo, and "incremental" was meant to be included in that sentence.

estebank · 2025-01-24T19:49:42 1737748182

Yes, indeed I accidentally forgot about incremental there.

bdhcuidbebe · 2025-01-24T19:51:25 1737748285

Why so hostile? Have a break, go look at the clouds, they are beautiful today!

panzi · 2025-01-25T03:05:21 1737774321

Why does AGPL Vs MIT matter for a linker?

usr1106 · 2025-01-25T09:53:14 1737798794

Hmm, my naive summary of AGPL is "If you run AGPL code in your web backend you are obliged to offer the backend source to everyone using a web client". No wonder it's explicitly forbidden at Google.

What does that mean for a linker? If you ship a binary linked with an AGPL linker you need to offer the source of the linker? Or of the program being linked?

nicoburns · 2025-01-25T09:56:11 1737798971

In practice I think it's pretty much equivalent to the GPL for a linker. But I can understand why people in commercial settings are wary of this license.

account42 · 2025-01-27T12:13:39 1737980019

Instead of spreading FUD you could go read the AGPL.

cies · 2025-01-25T13:26:45 1737811605

iirc the mold author wanted to make money off of it (and I dont blame him).

AGPL is avoided like the plague by big corps: same big corps are known for having money to pay for licenses and sometimes (yes, I look at you Amazon) being good at deriving value from FLOSS without giving back.

iirc AGPL was used so everyone can just use it, big biz is still compelled to buy a license. this has been done before and can be seen as one of the strategies to make money off FLOSS.

dspearson · 2025-01-25T18:22:14 1737829334

Under what circumstances would commercial companies be required to buy a license?! If they provide Linking as a Service?

cies · 2025-01-25T19:00:59 1737831659

They probably wont NEED a license, but --as said-- big corps dont touch AGPL with a ten foot pole because legal. So it's just to shut up legal, most likely.

o11c · 2025-01-25T06:48:47 1737787727

Corps want to be able to release and use tools that take away the freedoms that GPL-family licenses provide. Often this results in duplication of effort.

This is not theoretical; it happens quite frequently. For toolchains, in particular I'm aware of how Apple (not that they're unique in this) has "blah blah open source" downloads, but often they do not actually correspond with the binaries. And not just "not fully reproducible but close" but "entirely new and incompatible features".

The ARM64 saga is a notable example, which went on for at least six months (at least Sept 2013 to March 2014). XCode 5 shipped with a closed-source compiler only for all that time.

oguz-ismail · 2025-01-25T07:11:23 1737789083

So they donate money instead of code? The project somehow benefits from the switch to MIT?

zelcon · 2025-01-25T03:32:50 1737775970

Corps don't want to have to release the source code for their internal forks. They could also potentially be sued for everything they link using it because the linked binaries could be "derivative works" according to a judge who doesn't know anything.

pwdisswordfishz · 2025-01-25T07:39:28 1737790768

They don't have to release source for internal forks.

wyldfire · 2025-01-25T13:18:40 1737811120

They do if they're AGPL licensed and the internal form software is used to provide a user facing service.

amszmidt · 2025-01-27T07:15:03 1737962103

But then it isn’t “internal”…

zelcon · 2025-01-29T05:55:10 1738130110

It’s too hard to determine what pieces of your stack interact with public-facing services, particularly in a monorepo with thousands of developers. The effort involved and the legal risk if you get it wrong makes it an easy nope. Just ban AGPL.

amszmidt · 2025-02-03T09:52:36 1738576356

The effort involved, and legal risk is exactly the same as for any Copyleft license. If you don't know what your stack is doing, that is the problem -- not the license.

saagarjha · 2025-01-25T04:21:10 1737778870

I think you should get new lawyers if this is their understanding of how software licenses work.

mgsloan2 · 2025-01-25T06:40:52 1737787252

See for example https://opensource.google/documentation/reference/using/agpl...

> Code licensed under the GNU Affero General Public License (AGPL) MUST NOT be used at Google.

jenadine · 2025-01-25T06:49:16 1737787756

It’s their loss

Defletter · 2025-01-25T09:34:35 1737797675

Is it? Because open source tools re-licensing themselves to be more permissive would seem to indicate whose loss it really is.

michaelmrose · 2025-01-25T19:15:23 1737832523

This might indicate moreso that they believe they won't lose anything by the transition and users might ultimately benefit

mistercheph · 2025-01-25T10:45:46 1737801946

Embrace, extend, extinguish. it could take about a century, but every software company (hardware maybe next century) is in the process of being swallowed by free software. Thats not to say people can’t carve out a niche and have balling corporate retreats for a while.. until the sleeping giant wakes up and rolls over you.

zelcon · 2025-01-29T05:49:27 1738129767

Free software basically only exists because it’s subsidized by nonfree software. It also has no original ideas. Every piece of good free software is just a copy of something proprietary or some internal tool.

Defletter · 2025-01-25T17:24:17 1737825857

You've just made a pretty outrageous claim without evidence that would require a lot of effort on my part to refute, so I'll just go with: if you say so.

rerdavies · 2025-01-25T23:00:22 1737846022

I'm wondering if you've ever actually asked a real corporate lawyer for an opinion on anything relating to GPL licenses. The results are pretty consistent. I've made the trip on three occasions, and the response each time was: "this was not drafted by a lawyer, it's virtually ininterpretable, and it is wildly unpredictable what the consequences of using this software are."

saagarjha · 2025-01-27T04:46:06 1737953166

Why do some companies engage with it then?

amszmidt · 2025-01-27T07:17:42 1737962262

Eh, all the GNU family of licenses were drafted by lawyers.

Just using any Copyleft software has no legal consequences (copyleft licenses kick in when distributing, not using them).

integricho · 2025-01-25T06:01:26 1737784886

what is the status of Windows support in mold? reading the github issues leads to a circular confusion, the author first planned to support it, then moved Windows support to the sold linker, but then sold got archived recently so in the end there is no Windows support or did I just misunderstand the events?

secondcoming · 2025-01-24T17:03:31 1737738211

Maybe I'm holding it wrong, but mold isn't faster at all if you're using LTO, which you probably should be.

compiler-guy · 2025-01-24T18:41:30 1737744090

Mold will be faster than LLD even using LTO, but all of its benefits will be absolutely swamped by the LTO process, which is, more or less, recompiling the entire program from high-level LLVM-IR. That's extremely expensive and dwarfs any linking advantages.

So the benefit will be barely noticable. As another comment points out, LTO should only be used when you need a binary optimized to within an inch of its life, such as a release copy, or a copy for performance testing.

paulddraper · 2025-01-24T21:52:42 1737755562

Username checks out.

And factual.

saagarjha · 2025-01-25T04:21:54 1737778914

I'm waiting for 'linker-guy to weigh in, personally.

0x457 · 2025-01-24T17:12:04 1737738724

I think we're talking about non-release builds here. In those, you don't want to use LTO, you just want to get that binary as fast as possible.

Arelius · 2025-01-24T19:09:14 1737745754

Yeah, if you're development process requires LTO you may be holding it wrong....

Specifically, if LTO is so important that you need to be using it during development, you likely have a very exceptional case, or you have some big architectural issues that are causing much larger performance regressions then they should be.

IshKebab · 2025-01-25T18:55:03 1737831303

> you're development process requires LTO you may be holding it wrong....

Not necessarily. LTO does a very good job of dead code elimination which is sometimes necessary to fit code in microcontroller memory.

jcalvinowens · 2025-01-24T20:32:59 1737750779

If you're debugging, and your bug only reproduces with LTO enabled, you don't have much of a choice...

paulddraper · 2025-01-24T21:53:26 1737755606

Sure, for that 1% of the time.

thesz · 2025-01-24T22:49:08 1737758948

...which takes these remaining 99% of a development time...

saagarjha · 2025-01-25T04:22:41 1737778961

Surely your LTO bugs are not so easy to fix that they take less time to resolve than linking itself does.

benatkin · 2025-01-24T19:46:56 1737748016

Being able to choose a middle ground between development/debug builds and production builds is becoming increasingly important. This is especially true when developing in the browser, when often something appears to be slow in development mode but is fine in production mode.

WebAssembly and lightweight MicroVMs are enabling FaaS with real time code generation but the build toolchain makes it less appealing, when you don't want it to take half a minute to build or to be slow.

josephg · 2025-01-25T21:21:18 1737840078

> Yeah, if you're development process requires LTO you may be holding it wrong....

I spent a few months doing performance optimisation work. We wanted to see how much performance we could wring out of an algorithm & associated data structures. Each day I’d try and brainstorm new optimisations, implement them, and then A/B test the change to see how it actually affected performance. To get reliable tests, all benchmarks were run in release mode (with all optimisations - including LTO - turned on).

benatkin · 2025-01-24T19:53:59 1737748439

Agreed. Both fast and small are desirable for sandboxed (least authority) isomorphic (client and server) microservices with WebAssembly & related tech.

account42 · 2025-01-27T12:19:54 1737980394

You should be using LTO where incremental build times are a concern, i.e. for development builds.

And for realease builds link time is hardly a concern.

easythrees · 2025-01-24T20:13:41 1737749621

Wait a minute, it’s possible to relicense something from GPL to MIT?

prmoustache · 2025-01-24T21:53:40 1737755620

Yes if you are the only developper and never received nor accepted external contributions or if you managed to get permission from every single person who contributed or replaced their code with your own.

computably · 2025-01-24T22:06:38 1737756398

> or if you managed to get permission from every single person who contributed

This makes it sound more difficult than it actually is (logistically); it's not uncommon for major projects to require contributors to sign a CLA before accepting PRs.

mrighele · 2025-01-24T22:31:33 1737757893

That depends on how old and big is the project. For example Linux is "stuck" on GPL2 and even if they wanted to move to something else it wouldn't be feasible to get permission from all the people involved. Some contributors passed away making it even more difficult.

LeFantome · 2025-01-24T23:16:26 1737760586

Not exactly “stuck” since they very explicitly do not want to move to GPL 3.

CaptainOfCoit · 2025-01-25T12:30:56 1737808256

Even if they wanted to move to another license (which they don't), they wouldn't be able to do. So sounds exactly like they're "stuck", regardless of what they want.

bialpio · 2025-01-25T20:17:58 1737836278

How is the problem of "you signed a CLA without authorization by your employer to do so" solved? I'm mostly asking because I saw the following:

"I will not expose people taping out Hazard3 to the possibility of your employer chasing you for your contribution by harassing them legally. A contribution agreement does not solve this, because you may sign it without the legal capability to do so (...)"

https://github.com/Wren6991/Hazard3/blob/stable/Contributing... (this is I believe the repo with design for riscv cores running on RPi Pico 2)

prmoustache · 2025-01-25T20:21:03 1737836463

These are the ones I refuse to contribute to.

DrillShopper · 2025-01-24T20:18:39 1737749919

Yes. Generally you need permissions from contributors (either asking them directly or requiring a contribution agreement that assigns copyright for contributions to either the author or the org hosting the project), but you can relicense from any license to any other license.

That doesn't extinguish the prior versions under the prior license, but it does allow a project to change its license.

satvikpendem · 2025-01-24T16:52:18 1737737538

I looked at this before, is it ready for production? I thought not based on the readme, so I'm still using mold.

For those on macOS, Apple released a new linker about a year or two ago (which is why the mold author stopped working on their macOS version), and if you're using it with Rust, put this in your config.toml:

    [target.aarch64-apple-darwin]
    rustflags = [ 
        "-C",
        "link-arg=-fuse-ld=/Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin/ld",
        "-C",
        "link-arg=-ld_new",
    ]

dralley · 2025-01-24T17:59:56 1737741596

No, the author is pretty clear that it shouldn't be used for production yet

satvikpendem · 2025-01-24T19:35:33 1737747333

Great, I'll keep a look out but will hold off on using it for now.

brink · 2025-01-24T16:55:04 1737737704

I don't even use mold for production. It's for development.

bla3 · 2025-01-25T01:49:12 1737769752

Isn't the new linked just the default these days? I'm not sure adding that has any effect.

newman314 · 2025-01-24T21:57:14 1737755834

Can you confirm that's still the right location for Sequioa?

I have the command line tools installed and I only have /usr/bin/ld and /usr/bin/ld-classic

satvikpendem · 2025-01-24T22:00:14 1737756014

Then it'd be the /usr/bin/ld as I believe my solution was for before they moved the linker it seems.

saagarjha · 2025-01-25T04:23:35 1737779015

/usr/bin/ld will correctly invoke the right linker, it's a stub to look at your developer dir and reexec.

kryptiskt · 2025-01-24T18:23:56 1737743036

What would be refreshing would be a C/C++ compiler that did away with the intermediate step of linking and built the whole program as a unit. LTO doesn't even have to be a thing if the compiler can see the entire program in the first place. It would still have to save some build products so that incremental builds are possible, but not as object files, the compiler would need metadata to know of the origin and dependencies of all the generated code so it would be able to replace the right things.

External libs are most often linked dynamically these days, so they don't need to be built from source, so eliminating the linker doesn't pose a problem for non-open source dependencies. And if that's not enough letting the compiler also consume object files could provide for legacy use cases or edge cases where you must statically link to a binary.

dapperdrake · 2025-01-24T21:44:10 1737755050

SQLite3 just concatenation everything together into one compilation unit. So, more people have been using this than probably know about it.

https://sqlite.org/amalgamation.html

jdxcode · 2025-01-24T23:47:42 1737762462

I totally see the point of this, but still, you have to admit this is pretty funny:

> Developers sometimes experience trouble debugging the quarter-million line amalgamation source file because some debuggers are only able to handle source code line numbers less than 32,768 [...] To circumvent this limitation, the amalgamation is also available in a split form, consisting of files "sqlite3-1.c", "sqlite3-2.c", and so forth, where each file is less than 32,768 lines in length

yellowapple · 2025-01-25T04:29:23 1737779363

That would imply that such debuggers are storing line numbers as not just 16-bit numbers (which is probably sensible, considering that source files longer than that are uncommon), but as signed 16-bit numbers. I can't fathom a situation where line numbers would ever be negative.

usefulcat · 2025-01-25T06:58:55 1737788335

Cue C or C++ should-I-prefer-signed-or-unsigned-integers debate

eredengrin · 2025-01-25T07:18:58 1737789538

It's not that uncommon of a convention to strictly use signed numbers unless doing bit manipulation, eg the Google C++ Style Guide.

rocqua · 2025-01-25T11:33:10 1737804790

Notably, unsigned integers also have defined behavior for overflow. This means compilers can do less optimization on unsigned integers. For example, they can't assume that. x + 1 > x for unsigned ints, but are free to assume that for standard ints.

That is just another reason to stick with signed ints unless there is a very specific behavior you rely on.

yellowapple · 2025-01-27T02:22:29 1737944549

> For example, they can't assume that. x + 1 > x for unsigned ints, but are free to assume that for standard ints.

No they ain't:

    julia> x = typemax(Int16)
    32767
    
    julia> x + Int16(1) > x
    false

Integers are integers, and can roll over regardless of whether or not they're signed. Avoiding rollover is not a reason to stick with signed integers; indeed, rollover is a very good reason to avoid using signed integers unless you're specifically prepared to handle unexpectedly-negative values.

eredengrin · 2025-01-27T05:18:16 1737955096

It depends on the language. I linked a set of c++ guidelines and for c++, they are correct: it is undefined behaviour to do signed integer overflow. Some languages do specify it, eg rust, and even in c++ it might appear to work, but even then it is still undefined and should be strongly avoided.

yellowapple · 2025-01-27T12:34:41 1737981281

That's what I'm saying, though: rollovers can happen regardless of whether the integer is signed or unsigned. x + 1 > x is never a safe assumption for integers of the same fixed width, no matter if they're i16 or u16. Whether it's specifically acknowledged as defined or undefined behavior doesn't really change that fundamental property of fixed-width integer addition.

(As an aside: I'm personally fond of languages that let you specify what to do if an integer arithmetic result doesn't fit. Zig, for example, has separate operators for rollover v. saturation v. panicking/UB, which is handy. Pretty sure C++ has equivalents in its standard library.)

makapuf · 2025-01-25T06:51:23 1737787883

Maybe somewhere some line offset is stored as i16? (I don't understand why anyway but..)

shakna · 2025-01-25T09:48:28 1737798508

The __LINE__ macro defaults to "int". That then gets handed to the debugger.

qzzi · 2025-01-25T19:08:11 1737832091

The __LINE__ macro, like all other macros, is expanded during the preprocessing of the source code and is not handed to the debugger in any way.

shakna · 2025-01-26T09:35:35 1737884135

Yes... And debuggers that implement line numbers, generally work by taking that information as part of the preprocessing stage. And the #line and __LINE__ macro/directive were implemented _for debuggers_ when originally created. They were made to be handed over to the debugger.

If you simply compile and run, the debugger won't have __LINE__, no. But it also won't have line numbers, at all. So you might have missed a bit of context to this discussion - how are line numbers implemented in a debugger that does so, without access to the source?

qzzi · 2025-01-26T13:38:37 1737898717

No, the debugger does not get involved in preprocessing. When you write "a = __LINE__;", it expands to "a = 10;" (or whatever number) and is compiled, and the debugger has no knowledge of it. Debugging information, including the mapping of positions in the code to positions in the source, is generated by the compiler and embedded directly into the generated binary or an external file, from which the debugger reads it.

The __LINE__ macro is passed to the debugger only if the program itself outputs its value, and the "debugger" is a human reading that output :)

dapperdrake · 2025-01-25T00:09:46 1737763786

*concatenates

Apologies for the typo. And now it is too late to edit the post.

almostgotcaught · 2025-01-24T19:36:16 1737747376

[flagged]

nn3 · 2025-01-24T20:12:45 1737749565

>Secondly, if you think any compiler is meaningfully doing anything optimal >>("whole program analysis") on a TU scale greater than say ~50kloc (ie ~10 files) >relative to compiling individually you're dreaming.

That's wrong. gcc generates summaries of function properties and propagate those up and down the call tree, which for LTO is then build in a distributed way. It does much more than mere inlining, but even advanced analysis like points to analysis.

https://gcc.gnu.org/onlinedocs/gccint/IPA.html https://gcc.gnu.org/onlinedocs/gccint/IPA-passes.html

It scales to millions of lines of code because it's partioned.

jcalvinowens · 2025-01-24T20:53:47 1737752027

> if you think any compiler is meaningfully doing anything optimal ("whole program analysis") on a TU scale greater than say ~50kloc (ie ~10 files) relative to compiling individually you're dreaming.

You can build the Linux kernel with LTO: simply diff the LTO vs non-LTO outputs and it will be obvious you're wrong.

dapperdrake · 2025-01-24T21:45:18 1737755118

SQLite3 may be a counter-example:

https://sqlite.org/amalgamation.html

ComputerGuru · 2025-01-24T16:52:20 1737737540

There’s been a lot of interest in faster linkers spurred by the adoption and popularity of rust.

Even modest statically linked rust binaries can take a couple of minutes in the link stage of compilation in release mode (using mold). It’s not a rust-specific issue but an amalgam of (usually) strictly static linking, advanced link-time optimizations enabled by llvm like LTO and bolt, and a general dissatisfaction with compile times in the rust community. Rust’s (clinically) strong relationship with(read: dependency on) LLVM makes it the most popular language where LLVM link-time magic has been most heavily universally adopted; you could face these issues with C++ but it wouldn’t be chalked up to the language rather than your toolchain.

I’ve been eyeing wild for some time as I’m excited by the promise of an optimizing incremental linker, but to be frank, see zero incentive to even fiddle with it until it can actually, you know, link incrementally.

pjmlp · 2025-01-24T21:35:00 1737754500

C++ can be rather faster to compile than Rust, because some compilers do have incremental compilation, and incremental linking.

Additionally, the acceptance of binary libraries across the C and C++ ecosystem, means that more often than not, you only need to care about compiling you own application, and not the world, every time you clone a repo, or switch development branch.

yosefk · 2025-01-25T15:30:42 1737819042

compiling crates in parallel is fast on a good machine. OTOH managing C++ dependencies without a standard build & packaging system is a nightmare

pjmlp · 2025-01-25T17:09:22 1737824962

Imagine if Linus needed a gaming rig to develop Linux...

And he also did not had cargo at his disposal.

No need to point out it is C instead, as they share common roots, including place of birth.

Or how we used to compile C++ between 1986 and 2000's, mostly in single core machines, developing games, GUIs and distributed computing applications in CORBA and DCOM.

sitkack · 2025-01-24T17:46:40 1737740800

I solved this by using Wasm. Your outer application shell calls into Wasm business logic, only the inner logic needs to get recompiled, the outer app shell doesn't even need to restart.

ComputerGuru · 2025-01-24T17:52:10 1737741130

I don’t think I can use wasm with simd or syscalls, which is the bulk of my work.

sitkack · 2025-01-24T18:53:13 1737744793

I haven't used SIMD in Rust (or Wasm). Syscalls can be passed into the Wasm env.

https://doc.rust-lang.org/core/arch/wasm32/index.html#simd

https://nickb.dev/blog/authoring-a-simd-enhanced-wasm-librar...

Could definitely be more effort than it is worth just to speed up compilation.

SkiFire13 · 2025-01-24T21:04:29 1737752669

How is this different than dynamically linking the business logic library?

sitkack · 2025-01-25T00:04:05 1737763445

Very similar, but Wasm has additional safety properties and affordances. I am trying to get away from dynamic libs as an app extension mechanism. It is especially nice when application extension is open to end users, they won't be able to crash your application shell.

https://wasmtime.dev/ https://github.com/bytecodealliance/wasmtime

ajb · 2025-01-24T20:44:23 1737751463

2008: Gold, a new linker, intended to be faster than Gnu LD

2015(?): Lld a drop in replacement linker, at least 2x as fast as Gold

2021: mold, a new linker, several times faster than lld

2025: wild, a new linker...

o11c · 2025-01-25T06:54:01 1737788041

Rarely mentioned: all of these occur at the cost of not implementing a very large number of useful features used by real-world programs.

account42 · 2025-01-27T12:53:53 1737982433

Like ICF? Wait no, everyone supports that except GNU ld.

einpoklum · 2025-01-25T09:30:55 1737797455

Can you name a few of these features, for those of us who don't know much about linking beyond the fact that it takes compiled object files and makes an executable (and maybe does LTO)?

kibwen · 2025-01-25T15:14:11 1737818051

Presumably they're talking about linker scripts, and IMO if you're one of the vanishingly rare people who absolutely needs a linker script for some reason, then, firstly, my condolences, and secondly, given that 99.999% percent of users never need linker scripts, and given how much complexity and fragility their support adds to linker codebases, I'm perfectly happy to say that the rest of us can happily use fast and simple linkers that don't support linker scripts, and the other poor souls can keep using ld.

ajb · 2025-01-25T16:58:13 1737824293

Anyone wondering why you'd need a linker script: They are essential on bare metal, as you have to tell the linker where the hardware requires you to put stuff. There are lots in the linux kernel repo, u-boot, probably Arduino, etc.

account42 · 2025-01-27T12:56:11 1737982571

They are also very useful if you care a lot about the size of the resulting binary.

mshockwave · 2025-01-25T17:18:53 1737825533

It’s pretty common to roll your own linker script in embedded software development

wolfd · 2025-01-24T21:28:26 1737754106

I’m not sure if you’re intending to leave a negative or positive remark, or just a brief history, but the fact that people are still managing to squeeze better performance into linkers is very encouraging to me.

ajb · 2025-01-24T22:05:18 1737756318

Certainly no intention to be negative. Not having run the numbers, I don't know if the older ones got slower over time due to more features, or the new ones are squeezing out new performance gains. I guess it's also partly that the bigger codebases scaled up so much over this period, so that there are gains to be had that weren't interesting before.

wolfd · 2025-01-24T23:30:55 1737761455

Good question, I always wonder the same thing. https://www.phoronix.com/news/Mold-Linker-2024-Performance seems to show that that the newer linkers still outperform their predecessors, even after maturing. But of course this doesn’t show the full picture.

cbmuser · 2025-01-24T23:05:44 1737759944

Gold is slated for removal from binutils for version 2.44.0, so it's officially dead.

saagarjha · 2025-01-25T04:27:07 1737779227

Where is the effort going now? lld?

dundarious · 2025-01-24T20:51:14 1737751874

For windows, there is also [The RAD Linker](https://github.com/EpicGamesExt/raddebugger?tab=readme-ov-fi...) though quite early days.

fuzztester · 2025-01-24T22:19:22 1737757162

Related, and a good one, though old:

The book Linkers and Loaders by John Levine.

Last book in the list here:

https://www.johnlevine.com/books.phtml

I had read it some years ago, and found it quite interesting.

It's a standard one in the field.

He has also written some other popular computer books (see link above - pun not intended, but noticed).

shmerl · 2025-01-24T19:04:23 1737745463

That looks promising. In Rust to begin with and with the goal of being fast and support incremental linking.

To use it with Rust, this can probbaly also work using gcc as linker driver.

In project's .cargo/config.toml:

    [target.x86_64-unknown-linux-gnu]
    rustflags = ["-C", "link-arg=-fuse-ld=wild"]

Side note, but why does Rust need to plug into gcc or clang for that? Some missing functionality?

davidlattimore · 2025-01-24T21:24:56 1737753896

Unfortunately gcc doesn't accept arbitrary linkers via the `-fuse-ld=` flag. The only linkers it accepts are bfd, gold lld and mold. It is possible to use gcc to invoke wild as the linker, but currently to do that, you need to create a directory containing the wild linker and rename the binary (or a symlink) to "ld", then pass `-B/path/to/directory/containing/wild` to gcc.

As for why Rust uses gcc or clang to invoke the linker rather than invoking the linker directly - it's because the C compiler knows what linker flags are needed on the current platform in order to link against libc and the C runtime. Things like `Scrt1.o`, `crti.o`, `crtbeginS.o`, `crtendS.o` and `crtn.o`.

inkyoto · 2025-01-25T02:30:40 1737772240

> It is possible to use gcc to invoke wild as the linker, but currently to do that, you need to create a directory containing the wild linker and rename the binary (or a symlink) to "ld", then pass `-B/path/to/directory/containing/wild` to gcc.

Instead of renaming and passing -B in, you can also modify the GCC «spec» file's «%linker» section to make it point to a linker of your choice, i.e.

  %linker:
  /scratch/bin/wild %{wild_options}

Linking options can be amended in the «%link_command» section.

It is possible to either modify the default «spec» file («gcc -dumpspecs») or pass your own along via «-specs=my-specs-file». I have found custom «spec» files to be very useful in the past.

The «spec» file format is documented at https://gcc.gnu.org/onlinedocs/gcc/Spec-Files.html

shmerl · 2025-01-24T21:35:59 1737754559

Ah, good to know, thanks!

May be it's worth filing a feature request for gcc to have parity with clang for arbitrary linkers?

sedatk · 2025-01-24T20:09:41 1737749381

Because Rust compiler generates IR bytecode, not machine code.

shmerl · 2025-01-24T20:27:15 1737750435

That's the reason to use llvm as part of Rust compiler toolchain, not to use gcc or clang as linker manager?

sedatk · 2025-01-24T22:12:37 1737756757

You're right, @davidlattimore seems to have answered that.

KerrAvon · 2025-01-24T21:59:33 1737755973

I'm curious: what's the theory behind why this would be faster than mold in the non-incremental case? "Because Rust" is a fine explanation for a bunch of things, but doesn't explain expected performance benefits.

"Because there's low hanging concurrent fruit that Rust can help us get?" would be interesting but that's not explicitly stated or even implied.

davidlattimore · 2025-01-25T00:20:57 1737764457

I'm not actually sure, mostly because I'm not really familiar with the Mold codebase. One clue is that I've heard that Mold gets about a 10% speedup by using a faster allocator (mimalloc). I've tried using mimalloc with Wild and didn't get any measurable speedup. This suggests to me that Mold is probably making heavier use of the allocator than Wild is. With Wild, I've certainly tried to optimise the number of heap allocations.

But in general, I'd guess just different design decisions. As for how this might be related to Rust - I'm certain that were Wild ported from Rust to C or C++, that it would perform very similarly. However, code patterns that are fine in Rust due to the borrow checker, would be footguns in languages like C or C++, so maintaining that code could be tricky. Certainly when I've coded in C++ in the past, I've found myself coding more defensively, even at a small performance cost, whereas with Rust, I'm able to be a lot bolder because I know the compiler has got my back.

menaerus · 2025-01-25T08:46:44 1737794804

> Mold gets about a 10% speedup by using a faster allocator (mimalloc). I've tried using mimalloc with Wild and didn't get any measurable speedup

Perhaps it is worth repeating the experiment with heavy MLoC codebases. jmalloc or mimalloc.

einpoklum · 2025-01-25T10:13:15 1737799995

Rust is a perfectly fine language, and there's no reason you should not be able to implement fast incremental linking using Rust, so - I wish you success in doing that.

... however...

> code patterns that are fine in Rust due to the borrow checker, would be footguns in languages like C or C++,

That "dig" is probably not true. Or rather, your very conflation of C and C++ suggests that you are talking about the kind of code which would not be used in modern C++ of the past decade-or-more. While one _can_ write footguns in C++ easily, one can also very easily choose not to do so - especially when writing a new project.

panstromek · 2025-01-25T10:35:07 1737801307

Tell me you don't have rust experience without telling me you don't have rust experience.

panstromek · 2025-01-25T10:44:16 1737801856

I mean, sorry for the snark but really, there's so many of these things that it's just ridiculous to even attempt to compare. e.g. I wouln't ever use something like string_view or span unless the code is absolutely performance critical. There's a lot of defensive copying in C(++), because all the risks of losing track of pointers are just not worth it. In Rust, you can go really wild with this, there's no comparison.

einpoklum · 2025-01-25T16:24:50 1737822290

> because all the risks of losing track of pointers are just not worth it.

These risks are mostly, and often entirely, gone when you write modern C++. You don't lose track of them, because you don't track them, and you only use them when you don't need to track them. (Except for inside the implementations of a few data structures, which one can think of as the equivalent of unsafe code in Rust). Of course I'm generalizing here, but again, you just don't write C-style code, and you don't have those problems.

(You may have some other problems of course, C++ has many warts.)

panstromek · 2025-01-26T12:01:12 1737892872

I don't see how modern C++ solves any of those problems, and especially without performance implications.

Like, how do you make sure that you don't hold any dangling references to a vector that reallocated? How do you make sure that code that needs synchronization is synchronized? How do you make sure that non-thread safe code is never used from multiple threads? How do you make sure that you don't ever invalidate an iterator? How do you make sure that you don't hold a reference to a data owned by unique pointer that went out of scope? How do you make sure you don't hold a string view for a string that went out of scope?

As far as I know (and how I experienced it), the answer to all of those questions is to either use some special api that you have to know about, or do something non-optimal, like creating a defensive copy, use a shared pointer or adding "just in case" mutex, or "just remember you might cause problem a and be careful."

In Rust all of those problems are a compile error and you have to make an extra effort to trigger them at runtime with unsafe code. That's a very big difference and I don't understand how can modern C++ come even close to it.

einpoklum · 2025-01-26T16:46:51 1737910011

> Like, how do you make sure that you don't hold any dangling references to a vector that reallocated?

So, I'll first nitpick and say that's not a problem with pointer tracking.

To answer the question, though:

When I'm writing a function which receives a reference to a vector, then - either it's a const reference, in which case I don't change it, or it's a non-const reference, in which case I can safely assume I'm allowed to change it - but I can't keep any references or pointers into it, or iterators from it etc. I also expect and rely on functions that I call with a non-const reference to that vector, to act the same.

And when I create a vector, I just rely on the above in functions I call.

This is not some gamble. It's how C++ code is written. Yes, you can write code which breaks that principle if you like, but - I don't, and library authors don't.

> How do you make sure that code that needs synchronization is synchronized?

You mean, synchronization between threads which work on the same data? There's no one answer for that. It depends. If you want to be super-safe, you don't let your threads know about another other than multithread-aware data structures, whose methods ensure synchronization. Like a concurrent queue or map or something. If it's something more performance-critical whether synchronization is too expensive, then you might work out when/where it's safe for the threads to work on the same data, and keep the synchronization to a minimum. Which is kind of like unsafe Rust, I imagine. But it's true that it's pretty easy to ignore synchronization and just "let it rip", and C++ will not warn you about doing that. Still, you won't enter that danger zone unless you've explicitly decided to do multithreaded work.

About the Rust side of things... isn't it Turing-complete to know whether, and when, threads need to synchronize? I'm guessing that safe Rust demands that you not share data which has unsynchronized access, between threads.

> the answer to all of those questions is to either use some special api that you have to know about

C++ language features and standard library facilities are a "special API" that you have to know about. But then, so are raw pointers. A novice C++ programming student might not even be taught about using them until late in their first programming course.

My main point was, that if you talk about "C/C++ progamming", then you will necessarily not use most of those language features and facilities - which are commonly used in modern code and can keep you safe. You would be writing C-like code, and will have to be very careful (or reinvent the wheel, creating such mechanisms yourself).

panstromek · 2025-01-27T19:24:04 1738005844

Most of what you describe, especially in the multithreading part, is already a defensive practice. That's kind of the whole point. I don't deny that some modern C++ constructs help, I've used them, but the level of confidence is just not there. Note that I lump C and C++ together intentionally. For this purpose, they are almost equivalent as Rust tackles the problems they have in common.

I think it'd be better if you first try understand what actually Rust does here, for which I usually recommend this talk for C ++ developers, which describes the most important ideas on snippets of C++ and Rust side by side: https://youtu.be/IPmRDS0OSxM

That's probably my favourite demonstration.

einpoklum · 2025-01-29T11:46:33 1738151193

> I don't deny that some modern C++ constructs help

This thread started because you essentially denied these constructs have any significance, as you lumped the two languages together. You are still overstating your point.

Moreover - Rust has different design goals than any of these two languages. Indeed, neither of them guarantees memory safety at the language level; Rust makes different tradeoffs, paid a certain price, and does guarantee it. I will watch that video though.

account42 · 2025-01-27T13:00:10 1737982810

That you subject yourself to FUD is not an argument for anything.

panstromek · 2025-01-27T20:22:16 1738009336

No, it's just business. Memory corruption bugs are crazy expensive. One of those N cases goes wrong at some point and somebody will have to spend a week in gdb with corrupt stacktraces from production on some issue that's non determinstic and doesn't reproduce on dev machine.

bjourne · 2025-01-24T21:25:38 1737753938

What a coincidence. :) Just an hour ago I compared the performance of wild, mold, and (plain-old) ld on a C project I'm working on. 23 kloc and 172 files. Takes about 23.4 s of user time to compile with gcc+ld, 22.5 s with gcc+mold, and 21.8 s with gcc+wild. Which leads me to believe that link time shouldn't be that much of a problem for well-structured projects.

davidlattimore · 2025-01-24T21:41:33 1737754893

It sounds like you're building from scratch. In that case, the majority of the time will be spent compiling code, not linking. The case for fast linkers is strongest when doing iterative development. i.e. when making small changes to your code then rebuilding and running the result. With a small change, there's generally very little work for the compiler to do, but linking is still done from scratch, so tends to dominate.

commandersaki · 2025-01-25T02:51:07 1737773467

Yep in my case I have 11 * 450MB executables that take about 8 minutes to compile and link. But for small iterative programming cycles using the standard linker with g++, it takes about 30 seconds to link (If I remember correctly). I tried mold and shaved 25% of that time, which didn't seem worth the change overall; attempted wild a year ago but ran into issues, but will revisit at some point.

menaerus · 2025-01-25T08:43:14 1737794594

Exactly. But also even in build-from-scratch use-case when there's a multitude of binaries to be built - think 10s or 100s of (unit, integration, performance) test binaries or utilities that come along with the main release binary etc. Faster linkers giving even a modest 10% speedup per binary will quickly accumulate and will obviously scale much better.

bjourne · 2025-01-25T15:41:51 1737819711

True, I didn't think of that. However, the root cause here perhaps is fat binaries? My preferred development flow consists of many small self-contained dynamically linked libraries that executables link to. Then you only have to relink changed libraries and not executables that depend on them.

iknowstuff · 2025-01-25T17:57:37 1737827857

So is this your preferred flow because of slow linkers?

wolf550e · 2025-01-24T21:42:26 1737754946

The linker time is important when building something like Chrome, not small projects.

searealist · 2025-01-24T21:40:13 1737754813

Fast linkers are mostly useful in incremental compilation scenarios to cut down on the edit cycle.

ndesaulniers · 2025-01-24T21:26:50 1737754010

How about ld.lld?

1vuio0pswjnm7 · 2025-01-24T23:23:26 1737761006

"These benchmark were run on David Lattimore's laptop (2020 model System76 Lemur pro), which has 4 cores (8 threads) and 42 GB of RAM."

https://news.ycombinator.com/item?id=33330499

NB. This is not to suggest wild is bloated. The issue if any is the software being developed with it and the computers of those who might use such software.

1vuio0pswjnm7 · 2025-02-01T15:44:20 1738424660

https://news.ycombinator.com/item?id=42896619

"... I have 16 GB of ram, I can't upgrade it..."

klibertp · 2025-01-25T19:36:54 1737833814

Half in jest, but I'd think anybody coding in Rust already has 32GB of RAM...

(Personally, upgrading my laptop to 64GB at the expense of literally everything else was almost a great decision. Almost, because I really should have splurged on RAM and display instead of going all-in on RAM. The only downside is that cleaning up open tabs once a week became a chore, taking up the whole evening.)

sylware · 2025-01-24T19:47:22 1737748042

The real issue is actually runtime ELF (and PE) which are obsolete on modern hardware architecture.

bmacho · 2025-01-24T20:06:36 1737749196

What do you mean by this?

sylware · 2025-01-24T22:22:14 1737757334

ELF(COFF) should now be only an assembler output format on modern large hardware architecture.

On modern large hardware architecture, for executable files/dynamic libraries, ELF(PE[+]) has overkill complexity.

I am personnally using a executable file format of my own I do wrap into an "ELF capsule" on linux kernel. With position independent code, you kind of only need memory mapped segments (which dynamic libraries are in this very format). I have two very simple partial linkers I wrote in plain and simple C, one for risc-v assembly, one for x86_64 assembly, which allow me to link into such executable file some simple ELF object files (from binutils GAS).

There is no more centralized "ELF loader".

Of course, there are tradeoffs, 1 billion times worth it in regards of the accute simplicity of the format.

(I even have a little vm which allows me to interpret simple risc-v binaries on x86_64).

o11c · 2025-01-25T06:59:50 1737788390

You're giving up a lot if you stop using a format that supports multiple mapping, relro, dynamic relocations, ...

sylware · 2025-01-25T10:12:50 1737799970

This is where the "scam" from those excessively complex formats is: I, pertinently, do not give up a lot since I get the job done... but on the other side, I gain the removal of tons and tons of complexity, and nullify significant developer/vendor lock-in at the same time.

A good analogy to "feel" that, it is a bit like "json vs xml" but for executable binary formats.

But, I keep in mind, those formats (excrutiatingly simple) can work only on modern hardware architectures.

ndesaulniers · 2025-01-24T21:34:06 1737754446

Can it link the Linux kernel yet? Was a useful milestone for LLD.

davidlattimore · 2025-01-25T00:05:59 1737763559

Not yet. The Linux kernel uses linker scripts, which Wild doesn't yet support. I'd like to add support for linker scripts at some point, but it's some way down the priority list.

oguz-ismail · 2025-01-25T07:36:23 1737790583

Does it at least support -Ttext, -Tdata, etc.?

juujian · 2025-01-25T00:15:42 1737764142

Is it too late to ask what a linker is?

nappy-doo · 2025-01-25T00:49:39 1737766179

I'll ELI5:

Compilers take the code the programmer writes, and turns it into things called object files. Object files are close to executable by the target processor, but not completely. There are little places where the code needs to be rewritten to handle access to subroutines, access operating system functionality, and other things.

A linker combines all these object files, does the necessary rewriting, and generates something that the operating system can use.

It's the final step in building an executable.

--

More complicatedly: a linker is a little Turing machine that runs over the object files. Some can do complicated things like rewriting code, or optimizing across function calls. But, fundamentally, they plop all the object files together and follow little scripts (or rewrites) that clean up the places the compiler couldn't properly insert instructions because the compiler doesn't know the final layout of the program.

Mikhail_K · 2025-01-25T13:19:10 1737811150

I just knew it's going to be Rust as soon as I've read the title.

devit · 2025-01-24T16:49:26 1737737366

I think the optimal approach for development would be to not produce a traditional linked executable at all, but instead just place the object files in memory, and then produce a loader executable that hooks page faults in those memory areas and on-demand mmaps the relevant object elsewhere, applies relocations to it, and then moves it in place with mremap.

Symbols would be resolved based on an index where only updated object files are reindexed. It could also eagerly relocate in the background, in order depending on previous usage data.

This would basically make a copyless lazy incremental linker.

95014_refugee · 2025-01-24T18:00:15 1737741615

This makes some very naïve assumptions about the relationships between entities in a program; in particular that you can make arbitrary assertions about the representation of already-allocated datastructures across multiple versions of a component, that the program's compositional structure morphs in understandable ways, and that you can pause a program in a state where a component can actually be replaced.

By the time you have addressed these, you'll find yourself building a microkernel system with a collection of independent servers and well-defined interaction protocols. Which isn't necessarily a terrible way to assemble something, but it's not quite where you're trying to go...

fsfod · 2025-01-24T17:11:27 1737738687

You can sort of do that with some of LLVM's JIT systems https://llvm.org/docs/JITLink.html, I'm surprised that no one has yet made a edit and continue system using it.

all2 · 2025-01-24T17:23:53 1737739433

My parens sense is tingling. This sounds like a lisp-machine, or just standard lisp development environment.

klibertp · 2025-01-25T19:40:43 1737834043

Maybe of interest: https://github.com/clasp-developers/clasp/ (Lisp env. that uses LLVM for compilation; new-ish, actively developed.) However, my impression (I didn't measure it) is that the compilation speed is an order of magnitude slower than in SBCL, never mind CCL.

samatman · 2025-01-24T23:31:11 1737761471

They have! It's called Julia and it's great.

IshKebab · 2025-01-24T16:58:35 1737737915

Sounds like dynamic linking, sort of.

ignoramous · 2025-01-24T17:40:48 1737740448

> Symbols would be resolved based on an index where only updated object files are reindexed. It could also eagerly relocate in the background, in order depending on previous usage data.

Not exactly this, but Google's Propeller fixes up ("relinks") Basic Blocks (hot code as traced from PGO) in native code at runtime (like an optimizing JIT compiler would): https://research.google/pubs/propeller-a-profile-guided-reli...

eseidel · 2025-01-24T17:20:59 1737739259

Sounds like Apple's old ZeroLink from the aughts?

jjmarr · 2025-01-24T18:06:19 1737741979

Isn't this how dynamic linking works? If you really want to reduce build times, you should be making your hot path in the build a shared library, so you don't have to relink so long as you're not changing the interface.

hinkley · 2025-01-24T20:22:42 1737750162

But do rust’s invariants work across dynamic links?

I thought a lot of its proofs were done at compile time not link time.

Nullabillity · 2025-01-25T00:25:35 1737764735

Yesn't.

Rust is perfectly happy to emit/use dynamic links.[0] It's just that the primary C use case (distributing and updating the main app and its libraries separately) ends up being unsafe since Rust's ABI is unstable (so compiler versions, libraries, etc must match exactly).

Avoiding static relinking during development is pretty much the use where it does work. In fact, Bevy recommends this as part of its setup guide![1]

Practice paints a slightly less rosy picture, though; since the feature is exercised quite rarely, not all libraries work well with it in practice.[2]

[0]: https://doc.rust-lang.org/reference/linkage.html#r-link.dyli...

[1]: https://bevyengine.org/learn/quick-start/getting-started/set...

[2]: For example, https://github.com/linebender/bevy_vello/issues/84

pas · 2025-01-24T23:46:27 1737762387

The proof can be done on the whole code (in memory, incremental, etc), and then the modules emitted as dynamically loadable objects.

cbsmith · 2025-01-24T18:40:09 1737744009

That sounds a lot like traditional dynamic language runtimes. You kind of get that for free with Smalltalk/LISP/etc.

checker659 · 2025-01-24T17:36:27 1737740187

Linker overlays?

throwaway106382 · 2025-01-24T16:43:42 1737737022

> Mold is already very fast, however it doesn't do incremental linking and the author has stated that they don't intend to. Wild doesn't do incremental linking yet, but that is the end-goal. By writing Wild in Rust, it's hoped that the complexity of incremental linking will be achievable.

Can someone explain what is so special about Rust for this?

senkora · 2025-01-24T16:51:32 1737737492

I assume that he is referring to "fearless concurrency", the idea that Rust makes it possible to write more complex concurrent programs than other languages because of the safety guarantees:

https://doc.rust-lang.org/book/ch16-00-concurrency.html

So the logic would go:

1. mold doesn't do incremental linking because it is too complex to do it while still being fast (concurrent).

2. Rust makes it possible to write very complex fast (concurrent) programs.

3. A new linker written in Rust can do incremental linking while still being fast (concurrent).

EDIT: I meant this originally, but comments were posted before I added it so I want to be clear that this part is new: (Any of those three could be false; I take no strong position on that. But I believe that this is the motivating logic.)

ComputerGuru · 2025-01-24T16:54:38 1737737678

Actually a lot of the hacks that mold uses to be the fastest linker would be, ironically, harder to reproduce with rust because they’re antithetical to its approach. Eg Mold intentionally eschews used resource collection to speed up execution (it’ll be cleaned up by the os when the process exits) while rust has a strong RAII approach here that would introduce slowdowns.

junon · 2025-01-24T17:40:40 1737740440

You can absolutely introduce free-less allocators and the like, as well as use `ManuallyDrop` or `Box::leak`. Rust just asks that you're explicit about it.

cogman10 · 2025-01-24T17:02:34 1737738154

Depends on how things are approached.

You could, for example, take advantage of bump arena allocator in rust which would allow the linker to have just 1 alloc/dealloc. Mold is still using more traditional allocators under the covers which won't be as fast as a bump allocator. (Nothing would stop mold from doing the same).

cma · 2025-01-24T17:36:10 1737740170

Traditional allocators are fast if you never introduce much fragmentation with free, though you may still get some gaps and have some other overhead and not be quite as fast. But why couldn't you just LD_PRELOAD a malloc for mold that worked as a bump/stack/arena allocator and just ignored free if anything party stuff isn't making that many allocations?

zamalek · 2025-01-24T17:54:08 1737741248

> Traditional allocators are fast

Really it's allocators in general. Allocations are perceived as expensive only because they are mostly dependent on the amortized cost of prior deallocations. As an extreme example, even GCs can be fast if you avoid deallocation because most typically have a long-lived object heap that rarely gets collected - so if you keep things around that can be long-lived (pooling) their cost mostly goes away.

cogman10 · 2025-01-24T19:07:37 1737745657

Slight disagreement here.

Allocation is perceived as slow because it is. Getting memory from the OS is somewhat expensive because a page of memory needs to be allocated and stored off. Getting memory from traditional allocators is expensive because freespace needs to be tracked. When you say "I need 5 bytes" the allocator needs to find 5 free bytes to give back to you.

Bump allocators are fast because the operation of "I need 5 bytes" is incrementing the allocation pointer forward by 5 bytes and maybe doing a new page allocation if that's exhausted.

GC allocators are fast because they are generally bump allocators! The only difference is that when exhaustion happens the GC says "I need to run a GC".

Traditional allocators are a bit slower because they are typically something like an arena with skiplists used to find free space. When you free up memory, that skiplist needs to be updated.

But further, unlike bump and GC allocators another fault of traditional allocators is they have a tendency to scatter memory which has a negative impact on CPU cache performance. With the assumption that related memory tends to be allocated at the same time, GCs and bump allocators will colocate memory. But, because of the skiplist, traditional allocators will scattershot allocations to avoid free memory fragmentation.

All this said, for most apps this doesn't matter a whole lot. However, if you are doing a CPU/memory intense operation then this is stuff to know.

zamalek · 2025-01-25T00:20:59 1737764459

> Getting memory from traditional allocators is expensive because freespace needs to be tracked.

If you've never deallocated then there is no free space to track, hence the cost of allocation is _mostly_ (per my above comment) affected by the amortized cost of deallocations. The cost becomes extremely apparent with GCs because a failed allocation is usually what triggers a collection (and subsequently any deallocations that need to happen).

Still, your comment goes into detail that I probably should have.

dralley · 2025-01-24T18:02:00 1737741720

Nothing about Rust requires the use of the heap or RAII.

Also, if wild is indeed faster than mold even without incrementalism, as the benchmarks show, then it seems quite silly to go around making the argument that it's harder to write a fast linker in Rust. It's apparently not that hard.

Philpax · 2025-01-24T16:57:58 1737737878

I mean, that's pretty easy to do in Rust: https://doc.rust-lang.org/std/mem/struct.ManuallyDrop.html

Also see various arena allocator crates, etc.

ComputerGuru · 2025-01-24T17:03:28 1737738208

Not really. You would have to either wrap any standard library types in newtypes with ManuallyDrop implemented or (for some) use a custom allocator. And if you want to free some things in one go but not others that gets much harder, especially when you look at how easy a language like zig makes it.

And if you intentionally leak everything it is onerous to get the borrow checker to realize that unless you use a leaked box for all declaration/allocations, which introduces both friction and performance regressions (due to memory access patterns) because the use of custom allocators doesn’t factor into lifetime analysis.

(Spoken as a die-hard rust dev that still thinks it’s the better language than zig for most everything.)

Aurornis · 2025-01-24T17:37:42 1737740262

> You would have to either wrap any standard library types in newtypes with ManuallyDrop implemented

ManuallyDrop would presumably be implemented on large data structures where it matters, not on every single type involved in the program.

compiler-guy · 2025-01-24T16:54:12 1737737652

Both mold and lld are already very heavily concurrent. There is no fear at all there.

compiler-guy · 2025-01-24T16:53:08 1737737588

That’s puzzling to me too. Rust is a great language, and probably makes developing Wild faster. But the complexity of incremental linking doesn’t stem from the linker’s implementation language. It stems from all the tracking, reserved spacing, and other issues required to link a previously linked binary (or at least parts of it) a second time.

IshKebab · 2025-01-24T18:44:46 1737744286

Rust allows your to enforce more invariants at compile time, so implementing a complex system where you are likely to make a mistake and violate those invariants is easier.

tialaramex · 2025-01-24T18:09:53 1737742193

I would guess the idea is that in Rust the complexity is cheaper on a "per unit" basis so you can afford more complexity. So yes, it is a more complicated problem than the previous linkers, but, in Rust maybe you can get that done anyway.

IshKebab · 2025-01-24T16:57:50 1737737870

There are two main factors:

1. Rust's well designed type system and borrow checker makes writing code that works just easier. It has the "if it compiles it works" property (not unique to Rust; people say this about e.g. Haskell too).

2. Rust's type system - especially its trait system can be used to enforce safety constraints statically. The obvious one is the Send and Sync traits for thread safety, but there are others, e.g. the Fuchsia network code statically guarantees deadlocks are impossible.

Mold is written in C++ which is extremely error prone in comparison.