I precommitted to taking exactly ten samples and GPT-4 gave a correct answer eight times. I then precommitted to taking ten more, and it nailed every one, bringing the success rate to 90%. The two failures had a single six-letter word but were otherwise correct.
Skepticism is fine, but being skeptical out of mere ignorance of what these things can do is not.
These were separate experiments and thus I reported their results separately. Honestly, if anything, I was expecting more failures the second time around.