The problem is that volatile alone never portably guaranteed atomicity nor barriers, so such a spinlock would simply not work correctly on many architectures: other writes around it might be reordered in a way that make the lock useless.
It does kinda sorta work on x86 due its much-stronger-than-usual guarantees wrt move instructions even in the absence of explicit barriers. And because x86 was so dominant, people could get away with that for a while in "portable" code (which wasn't really portable).
TL;DR: The compiler can reorder memory accesses and the CPU can reorder memory accesses. With a few notable exceptions, you usually don't have to worry about the latter on non-SMP systems, and volatile does address the former.
The volatile qualifier makes any reads or writes to that object a side-effect. This means that the compiler is not free to reorder or eliminate the accesses with respect to other side-effects.
If you have all 3 of:
A) A type that compiles down to a single memory access
B) within the same MMU mapping (e.g. a process)
C) With a single CPU accessing the memory (e.g. a non-SMP system)
Then volatile accomplishes the goal of read/writes to a shared value across multiple threads being visible. This is because modern CPUs don't have any hardware concept of threads; it's just an interrupt that happens to change the PC and stack pointer.
If you don't have (A) then even with atomics and barriers you are in trouble and you need a mutex for proper modifications.
If you don't have (B) then you may need to manage the caches (e.g. ARMv5 has virtually tagged caches so the same physical address can be in two different cache lines)
If you don't have (C) (e.g. an SMP system) then you need to do something architecture specific[1]. Prior to C language support for barriers that usually means a CPU intrinsic, inline assembly, or just writing your shared accesses in assembly and calling them as functions.
Something else I think you are referring to is if you have two shared values and only one is volatile, then the access to the other can be freely reordered by the compiler. This is true. It also is often masked by the fact that shared values are usually globals, and non-inlined functions are assumed by most compilers to be capable of writing to any global so a function call will accidentally become a barrier.
1: As you mention, on the x86 that "something" is often "nothing." But most other architectures don't work that way.
It does kinda sorta work on x86 due its much-stronger-than-usual guarantees wrt move instructions even in the absence of explicit barriers. And because x86 was so dominant, people could get away with that for a while in "portable" code (which wasn't really portable).