You can 100% write services with P999 < 20ms in go.
Not even trying that hard.
Go is entirely suitable for this kind of constraints, I dare say that's go's main target.
P99 < 1ms, that's when you're going to want to switch it up.
Going fast is one thing. Making a program that responds consistently is different, and there's a continuum of choices. The last time I read about Go's GC they targeted 500us and for tons of applications that's more than sufficient; for some, it's not.
You could start with twiddling some of the GC knobs Go gives you, but you're still working against an SLO. If you need stronger guarantees you'll look at languages that completely eschew GC, because Go's GC still has STW bits. Climb the ladder further and you're reducing allocations, eventually avoiding any malloc() beyond what it takes to get an arena and doing your own bookkeeping. I've never been near the top of the ladder when you have hard real-time constraints, but I've heard it involves paying Wind River for VxWorks licenses ;)
was the double-negative intentional? I've used Go for sub-millisecond needs. So 20ms seems like it would be a reasonable choice from where I'm sitting.
It was not intentional, thanks for asking...very unfortunate typo ;)
Go doesn't give you control over inline vs indirect allocation, instead relying on escape analysis, which is notoriously finicky. Seemingly unrelated changes, along with compiler upgrades, can ruin your carefully optimized code.
This is especially heinous because it uses a GC; unnecessary allocations have a disproportionately large impact on your application performance. One or the other wouldn't be nearly as bad.
Time and time again we see reports from organizations/projects with perfectly fine average latency, but horrendous p95+ times, when written in Go - some going as far as to do straight-up insane optimizations (see Dragph) or rewrite in other languages.
EDIT: Fixed unfortunate typo