I mean, yes, if you keep asking it in different ways until you get the right ans...

jonahx · on May 20, 2023

The difference is GPT4. Unfortunately these were run on 3.5.

I asked GPT4 the question verbatim, just one time, and like the grandparent got:

"Every night Linda reads short books about space."

Sharlin · on May 20, 2023

I precommitted to taking exactly ten samples and GPT-4 gave a correct answer eight times. I then precommitted to taking ten more, and it nailed every one, bringing the success rate to 90%. The two failures had a single six-letter word but were otherwise correct.

Skepticism is fine, but being skeptical out of mere ignorance of what these things can do is not.

morelisp · on May 20, 2023

GPT counts letters as well as you precommit to taking exactly ten samples!

Sharlin · on May 20, 2023

These were separate experiments and thus I reported their results separately. Honestly, if anything, I was expecting more failures the second time around.