Going fast is one thing. Making a program that responds consistently is different, and there's a continuum of choices. The last time I read about Go's GC they targeted 500us and for tons of applications that's more than sufficient; for some, it's not.
You could start with twiddling some of the GC knobs Go gives you, but you're still working against an SLO. If you need stronger guarantees you'll look at languages that completely eschew GC, because Go's GC still has STW bits. Climb the ladder further and you're reducing allocations, eventually avoiding any malloc() beyond what it takes to get an arena and doing your own bookkeeping. I've never been near the top of the ladder when you have hard real-time constraints, but I've heard it involves paying Wind River for VxWorks licenses ;)