To add on top of this: This is a tracing GC. It only ever visits the live data, ...

tomp · 2025-09-05T08:08:35 1757059715

A non-moving GC must visit dead objects.

pizlonator · 2025-09-05T13:40:00 1757079600

Not quite.

FUGC used a bit vector SIMD sweep using a bit vector on the side so it doesn’t visit the dead objects at all in the sense that it doesn’t touch their contents. And it only visits them in the sense that a single instruction deals with many dead or alive objects at once.

writebetterc · 2025-09-05T11:13:35 1757070815

I forgot that this GC is non-moving (I'm not used to that assumption, and it was a bit of a quick comment).

I do find the statement dubious still, do you mind clearing it up for me?

Given a page { void* addr; size_t size; size_t alignment; BitMap used; } where used's size in bits is page.size / page.alignment, surely we only need to visit the used bitmap for marking a memory slot as free?

kragen · 2025-09-05T12:04:03 1757073843

Yes, I agree. (This thread continued in https://news.ycombinator.com/item?id=45137286.)

tomp · 2025-09-05T13:11:05 1757077865

You’re correct, I forgot about that optimisation!

kragen · 2025-09-05T07:25:07 1757057107

Really? How does a non-moving GC make dead objects available for reallocation without visiting them?

torginus · 2025-09-05T08:58:29 1757062709

Why would it need to visit them? It just marks the address ranges as available in its internal bookkeeping (bitmaps etc).

kragen · 2025-09-05T09:23:37 1757064217

In the general case there are as many newly available address ranges as dead objects, so that counts as visiting them in this context.

torginus · 2025-09-05T09:59:25 1757066365

I don't think that's a definition of 'visit' most people would agree with.

I'm actually working on my own language that has a non-moving GC. It uses size classes (so 16 byte objects, 32 byte objects etc.), each of which is allocated in a continous slab of memory. Occupancy is determined by a bitmap, 1 bit for each slot in the slab.

The GC constructs a liveness bitmap for the size class, and the results are ANDed together, 'freeing' the memory. If you fill the slab with dead objects, then run the GC, it will not walk anywhere on this slab, create an all zero liveness bitmap, and free the memory.

kragen · 2025-09-05T10:51:34 1757069494

That's an awesome project! Is your GC generational despite being non-moving? What are your main objectives for the project?

The liveness bitmap approach is pretty widespread at this point; jemalloc works the same way IIRC.

Still, I think that counts as "visiting" in the context of this discussion: https://news.ycombinator.com/item?id=45137139

writebetterc · 2025-09-05T11:16:58 1757071018

I don't think it counts as visiting, as you never look at the dirtied bitmap during GC, only during allocation. That means, you don't actually know if a dirty bit represents a different object or not (if a 16-byte size class is allowed to have 32-byte objs in it, for example). To know that you'd either have to have strict size classes, or you'd have to have object headers for specifying the start of an object.

I agree that it's easy to add in a visitation pass, where you take the bitmap of live objects after marking and diff it with the currently existing one in order to signal that you might have a leak.

So basically, I think we're like 99% in agreement.

kragen · 2025-09-05T11:40:53 1757072453

It's always nice when the impact of collision of opposing opinions gives rise to the spark of mutual understanding rather than merely inflaming base emotions.

Typically bitmap-based allocators don't actually allow a 16-byte size class to have 32-byte objects in it, but I haven't looked at FUGC to see if that's true of it.

torginus · 2025-09-05T12:26:50 1757075210

I toyed with the idea of allowing this, in bitmaps, it's pretty easy and efficient to find contiguous areas with bit twiddling hacks, for example

//assume free map is the bitmap where 1 means free

uint32_t free_map;

uint32_t free_map_2 = (free_map & (free_map >> 1)); // so on and so forth

I haven't really done anything like this yet, it has certain disadvantages, but you can pack multiple size classes into the same bitmap, you do a bit more work during alloc and resolving interior pointers is a bit more costly (if you have those), in exchange for having less size classes.

kragen · 2025-09-05T12:51:09 1757076669

Sure, to find contiguous chunks of 6 slots within a single word, you can do

    t &= t << 1;
    t &= t << 2;
    t &= t << 2;

and that sort of thing is pretty appealing, but you lose the ability to know what size an object is just by looking at an address, and it's still a lot slower than scanning for an open slot in a page of 5× bigger objects.

Should I assume from your use of uint32_t that you're targeting embedded ARM microcontrollers?

pizlonator · 2025-09-05T13:42:01 1757079721

FUGC is size segregated. 16 byte size class will only have 16 byte objects.

A bunch of other optimizations fall out from doing that

torginus · 2025-09-05T11:57:22 1757073442

It's not generational, because unlike Java, but like C or C++, programs aren't supposed to generate a lot of ephemeral objects while they run. I also wanted to keep things as simple as possible to have a chance of actually shipping something in my lifetime :D

kragen · 2025-09-05T12:05:00 1757073900

That sounds like a good approach! Is it public?

torginus · 2025-09-05T12:15:31 1757074531

Not yet unfortunately, there are a few thorny issues, and I want to get it into an actually usable state before I dare make any claims about it :)

thomasmg · 2025-09-05T10:11:58 1757067118

> there are as many newly available address ranges as dead objects

Well, when using a bitmap (as they seem to do in the article), then multiple subsequent dead objects are considered to be in the same range, because multiple subsequent bits in the bitmap have the value zero. There is no need to visit each zero bit in the bitmap separately.

thomasmg · 2025-09-05T09:07:25 1757063245

If you want to use a standard malloc / free implementation (dlmalloc etc.) then dead object need to be known, yes.

But the malloc library could be fully integrated into the GC mechanism. In this case, there is no need. That's probably much easier, and faster, because the malloc can be simplified to bumping a pointer.

kragen · 2025-09-05T09:24:43 1757064283

That works if you use a copying garbage collector, but not a non-moving collector like FUGC.

thomasmg · 2025-09-05T10:00:08 1757066408

OK, I did not read the source code of the FUGC, but the article mentions "FUGC relies on a sweeping algorithm based on bitvector SIMD." So, assuming there is just one bit per block of memory, then there is no need to visit the memory of the dead objects in the sweep phase. The bit of the dead object is zero, and so that memory block is considered free and available for reallocation. There is no need to visit the free block.

kragen · 2025-09-05T10:47:33 1757069253

It's true that it doesn't dirty up your dcache, but it's "visiting" enough that it wouldn't "need a lot more special support if it wanted to report the dead objects," which is the context of the discussion here.