And that's really too bad because I think the rest of the comment is spot-on as the fundamental issue is that Rust simply delegates to the system allocator, and MacOS's allocator is pretty slow.
As a result, while Rust allows very explicitly and relatively easily removing allocations (compared to C or C++), getting the most performances out of your program also requires doing so, unless you use a non-system allocator with better support for the workload.
That is true — and was demonstrated by the article as its "fix" was to use jemalloc, but even a custom allocator will usually be less performant than a high-performance GC there, because the GC's allocator has more insight into the requirements and workloads.
It might be possible to give that insight to a custom allocator by using the allocator's custom APIs, but this requires a deeper integration between the program and the allocator.
Sure you do. You can even build one from nothing but mmap in pure Go. It just won't be part of the garbage collection, so you get malloc/free or arenas etc, just like in C/Rust/whatever.
As a result, while Rust allows very explicitly and relatively easily removing allocations (compared to C or C++), getting the most performances out of your program also requires doing so, unless you use a non-system allocator with better support for the workload.