> because eventually the full heap will be scanned anyways, as it is virtually i...

pkolaczk · on April 2, 2024

In languages like C++ and Rust the majority of objects have trivial lifetimes and don't need reference counting at all. It is less than 0.1% of heap objects that need to be tracked by Rc/Arc, and even for them the reference updates are usually rare - like only at the moment of creating a new thread / task / request etc.

Hence, even if per object the cost of RC may be higher than tracing, the fact that there are 1000x fewer objects to manage makes RC a huge win. I guess Swift is also not that stupid and doesn't update the references every time a temporary is created, etc.

pron · on April 4, 2024

> In languages like C++ and Rust the majority of objects have trivial lifetimes and don't need reference counting at all.

Sure, that's why they can get away with an inefficient GC.

> Hence, even if per object the cost of RC may be higher than tracing, the fact that there are 1000x fewer objects to manage makes RC a huge win.

The "win" here is not RC at all but the fact that there's little reliance on GC and, of course, it's not really a "win" but something you buy by giving up on something else (i.e. a tradeoff), in this case abstraction ability (memory management details cannot be hidden as implementation details), and giving that up increases maintenance costs. So to get the footprint savings you have to pay -- either with performance, if you rely heavily on a refcounting GC, or with maintenance cost, if you don't. To get the performance you can choose to pay for RAM or for more development hours, but either way -- you have to pay.

Tracing GCs manage not to do any work for the vast majority of objects, those with trivial lifetimes -- they, too, don't manage the vast majority of objects (the allocator just bumps a pointer and the GC is never even aware of the existence of most of objects) -- by not freeing dead objects promptly which comes at the cost of spending more RAM.

In other words: good footprint, good performance, good abstraction (i.e. lower maintenance costs) -- pick two and pay by giving up on the third. C++/Zig/Rust pick the first two, Java/C#/Go pick the last two, and I guess Python picks the first and last.

There are, of course, other nuances; e.g. most refcounting GCs in common use leak memory as they are too crude to handle cycles, and not all tracing GCs are state-of-the-art.

pebal · on April 4, 2024

You don't have to choose in C++ because you can have optional tracked pointers managed by the pause-free GC engine.

pron · on April 4, 2024

That "pause-free" GC engine performs worse than modern tracing collectors, so by using C++ you've already chosen lower footprint, and then you have to choose between worse performance or more work. I think many people are unaware of the advancements in tracing GCs over the past few years, but modern concurrent collectors don't do any collection in stop-the-world pauses.

pebal · on April 4, 2024

Not true. Visit the repositorium and see the benchmark results: https://github.com/frol/completely-unscientific-benchmarks

The SGCL repository contains the source code for this benchmark that uses the tracked pointers: https://github.com/pebal/sgcl/blob/main/examples/treap/treap...

Only D language is close to the result, the rest of the GC languages are several times slower.

(clang on Windows)

raw pointers: 149ms

tracked pointers: 181ms

pron · on April 4, 2024

Putting aside the fact that the compilation time is not included in the C++ result but is included in the Java result, the Java implementation allocates many more objects than the C++ implementation, that the GC used is old, and that there seems to be no interesting GC activity (shown by the fact that the results are similar with a 50MB heap, not to mention that in the C++ implementation, there is only a small number of objects all of the same size, and the code is single-threaded), I have to wonder: you see the result of a self-proclaimed "completely unscientific" benchmark and conclude that a 100-line GC using a decades-old algorithm beats decades of research and five generations of 100KLOC GCs done by the top garbage collection experts in the world spent and think, "this must be right"?

All the people working on Java at Sun/Oracle, on Go at Google and at C# at Microsoft (who, BTW, have written most of these languages' runtimes in C++) spent years and years designing elaborate and sophisticated algorithms that are worse than a crude implementation of a 50-year-old GC algorithm because we want to spend tens of millions of dollars and decades of work to give our users worse results? In your mind, what was it that possessed everyone designing high-performance language runtimes for non-memory-constrained environments over the past three decades, at unrelated projects that compete with one another, to all choose variants of a worse algorithm despite all of those variants being several orders of magnitude more costly to develop than what you believe is the better algorithm?

When someone comes up with a more performant GC algorithm than tracing -- we'll all switch to it. Who knows? It may be some sophisticated form of refocounting, but it ain't simple refcounting.

pebal · on April 4, 2024

Java needs to generates a lot of garbage, whereas C++ does not. Therefore, GC for C++ can be simpler and generate lower overhead than GC for Java.

I'm not interested in the amount of money companies invest in developing garbage collection systems, I'm interested in code performance. Are you aware of any other absolutely zero pause GCs?

pron · on April 4, 2024

> Java needs to generates a lot of garbage, whereas C++ does not. Therefore, GC for C++ can be simpler and generate lower overhead than GC for Java.

You mean lower footprint overhead; higher performance overhead. But that was my point: Higher-level, high-performance languages that rely on good abstraction -- like Java, C# and Go -- need a better-performing GC, which is why they choose tracing, and pay for it with more footprint. In terms of performance, tracing used in industry beats refcounting as used in industry.

> I'm interested in code performance

Sure, but you have to pay for: either with a higher footprint or with higher maintenance costs.

> Are you aware of any other absolutely zero pause GCs?

The ones I know of are ZGC and Azul's C4, which preceded ZGC by some years but was not open-sourced. ZGC isn't some obscure technology, BTW. It is currently running most of Netflix's workload, and probably other very large companies.

(Also, ZGC, does no collection work in pauses; it does pause threads to synchronise a state transition but for durations similar to those that the OS also pauses threads for).

pebal · on April 4, 2024

Java generates a lot of garbage because its designers thought that the stack is unnecessary and GC is enough for everything. They were wrong.

Both ZGC and Azul C4 pause threads. SGCL never pause threads, so it can collect garbage more often and more efficiently.

pron · on April 4, 2024

> They were wrong.

As their goal was a language that would be very popular, and given that languages that heavily rely on GC have many times more users than languages that don't, it doesn't seem that they were wrong, at least not about the centrality of GC (although, "value types" and arrays-of-structs are coming to Java).

> SGCL never pause threads, so it can collect garbage more often and more efficiently.

I can't examine the algorithm right now and I couldn't find a paper describing it (so I can't tell you if it's one of the algorithms we've tried), but if you believe you have a more efficient GC than those in the JDK, feel free to plug it in and test. If you're right, you'll have many millions of users and the overall impact of that algorithm on the software industry would be much greater than it would as a C++ GC.

neonsunset · on April 4, 2024

The results seem to be old. Numbers might have changed quite a bit (e.g. .NET core 3.1 -> .NET 8 is a big performance jump).

bluGill · on April 1, 2024

> Scanning the working set is still cheaper than tracking all references constantly.

that is why you need to be careful not to needlessly update your reference count for temporaries. managing memory is not free. generational garbage collection avoids a lot of thought but it demands more memory.

pron · on April 1, 2024

Yes. As I said repeatedly, the one thing that tracing sacrifices is memory footprint. It effectively turns RAM into performance. If you want a smaller footprint you either have to sacrifice performance or pay with more development effort.