To be clear, this is applying the classic observation that if you keep the first and last letter correct, humans are really good at unjumbling the center.
Wonder if you can choose a sentence such that humans see it one way (homophones, jumbles), perhaps via context clues, but the closest match (edit distance?) for the individual words gives a different sentence?
Certainly seems doable.
Along those lines would be something like: "if I have a coin and I, err,trun tit, which face is showing?" but it's not a good example. Here "err, trun tit" gets corrected to return but the end should find to "err, turn it" instead making the face showing be "the opposite".
Hopefully you get the idea, bet there are some really good phrases that would fit this scheme.
A better way to say what I was getting at is that fairly straightforward language statistics go a long way towards unjumbling letters. A spell checker could also include quite a few human perceptual quirks as scoring rules without crossing the line into what I would think of as training an AI.