std::atomic<something-128bit-sized> do, or at least did last time I checked, use cmpxcg16b on gcc. There were talks about changing that because of surprising behavior: as there is no 128 atomic load, even loads need to use the cas, making them significantly slower than 'normal' atomic loads; also const std::atomic<128bit> is unusable in read only pages which is an expected use case for atomics.
I do not know whether that has actually been 'fixed' though.