I don't know, it doesn't sound wild at all to me. Human languages are very imprecise, vague and error-tolerant, which is the opposite of an output format like JSON. So the a model can't do these two things well at the same time is quite an intuitive conclusion.
The wild part is that a model trained with so much human language text can still outputs mostly compilable code.
The wild part is that a model trained with so much human language text can still outputs mostly compilable code.