Hacker News new | past | comments | ask | show | jobs | submit login

I mean, yes, if you keep asking it in different ways until you get the right answer and then stop, Clever Hans can count.



The difference is GPT4. Unfortunately these were run on 3.5.

I asked GPT4 the question verbatim, just one time, and like the grandparent got:

"Every night Linda reads short books about space."


I precommitted to taking exactly ten samples and GPT-4 gave a correct answer eight times. I then precommitted to taking ten more, and it nailed every one, bringing the success rate to 90%. The two failures had a single six-letter word but were otherwise correct.

Skepticism is fine, but being skeptical out of mere ignorance of what these things can do is not.


GPT counts letters as well as you precommit to taking exactly ten samples!


These were separate experiments and thus I reported their results separately. Honestly, if anything, I was expecting more failures the second time around.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: