AVB and TSN are relatively new addition to Ethernet, and specifically designed for realtime AV use. Traditional Ethernet is not really intended for tight real-time use.
There is audio distribution real-time and nuclear reactor (or avionics, medical, etc) real-time. I assume the people doing (or certifying) the latter will want better guarantees.
The usual terminology is hard, firm and soft realtime. In hard realtime, missing a deadline is a total failure and to be avoided at all cost, i.e. your reactor melts down, you car runs over someone, stuff like that. Firm realtime means that a missed deadline will not be a total catastrophic failure but it will make the result useless. E.g. when your printer control system mistimes the "fire ink now" in the printer, and the ink lands not on the page but somewhere else. Soft realtime means that your result will gradually degrade with missed deadlines, but not be totally useless.
Audio is usually soft realtime, sometimes, e.g. when doing studio recordings, firm realtime.
It should be noted these distinctions don't correlate with timing; you can have a hard realtime system that needs some network packets at 50ms±10ms intervals, and a soft realtime system that needs packets at 500µs±5µs.
Some audio setups are run quite "close to the metal", both because it needs less buffering, but also the lower human threshold for noticing latency seems to be around 10ms. And having audio not get out of phase with multiple sources/sinks gets added on top of that.
Correct. If you imagine having a dam overflow, the release valves are a hard realtime system. If the dam overflows for more than a few minutes, damage will occur, so the release valves need to be opened within, say, overflow plus 5mins. A generous deadline for any computer, but still a deadline that needs to be kept at all cost.