That's correct. It should be possible to have an explicit guarantee of TCO while maintaining the ability to write things that aren't trivially reducible to loops...
No, scala's @tailrec annotation only guarantees that calls that can trivially be compiled to loops in fact are. You can't do better on a standard JVM without jumping through hoops (or, rather, trampolines).
Direct mutual recursion is somewhat rare (but useful for encoding state machines, for instance). However, using continuation passing style (and tail calls to a continuation argument) to prevent stack overflow is a common idiom in some functional languages (e.g. when mapping over a binary tree).
Erlang uses mutual TCO everywhere. All actors/processes are state machines with the function they're in being their current state. To switch states, you just tail-call a new function (which doesn't have to be from the same module you were in.)