Are you kidding? There are 30 claims, it's an hours' work to make complete sense of how these work together and what they possibly do/do not cover. I've filed my own patents so have read thru enough of prior art and am not doing it for a pointless internet argument.
IANAL. I looked through the patent, not just the Claims. I certainly didn't read all of it. But while it leaves open many possible variations, it's a patent for sequence transduction and it's quite explicit everywhere that the system comprises a decoder and an encoder (see Claim 1, the most vague) and nowhere did I see any hint that you could leave out one or the other or that you could leave out the encoder-decoder attention submodule (the "degenerate use-case" you suggested). The patent is only about sequence transduction (e.g. in translation).
Now an encoder+decoder is very similar to a decoder-only transformer, but it's certainly an inventive step to make that modification and I'm pretty sure the patent doesn't contain it. It does describe all the other pieces of a decoder/encoder-only transformer though, despite not being covered by any of the claims, and I have no idea what a court would think about that since IANAL.