Isn't it the case with Java that it will do this because you do have the memory to spend on it? Generally this "handling the mess afterward" involves some kind of nursery or early generation these days, but their size may be use-case-dependent. If tuned for a 8/16 GB environment, presumably the "sawtooth" wouldn't need to be as tall.
Slower in what metrics? Latency? Throughput? Not to mention that the behavior may strongly depend on the GC design and the HW platform in question. It seems far too difficult to make a blanket statement about what is and isn't achievable in a specific use case.