Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It can spell the word (writing each letter in uppercase followed by a whitespace, which should turn each letter with its whitespace into a separate token). It also has reasoning tokens to use as scratch space, and previous models have demonstrated knowledge of the fact that spelling words is a useful step to counting letters.

Tokenization makes the problem difficult, but not solving it is still a reasoning/intelligence issue



Here's an example of what gpt-oss-20b (at the default mxfp4 precision) does with this question:

> How many "s"es are in the word "Mississippi"?

The "thinking portion" is:

> Count letters: M i s s i s s i p p i -> s appears 4 times? Actually Mississippi has s's: positions 3,4,6,7 = 4.

The answer is:

> The word “Mississippi” contains four letter “s” s.

They can indeed do some simple pattern matching on the query, separate the letters out into separate tokens, and count them without having to do something like run code in a sandbox and ask it the answer.

The issue here is just that this workaround/strategy is only trained into the "thinking" models, afaict.


That proves nothing. The fact that Mississippi has 4 "s" is far more likely to be in the training data than the fact that blueberry has 2 "b"s.

And now that fact is going to be in the data for the next round of training. We'll need to need to try some other words on the next model.


It does the same thing with a bunch of different words like "committee", "disestablishmentarianism", "dog", "Anaxagoras", and a string I typed by mashing the keyboard, "jwfekduadasjeudapu". It seems fairly general and to perform pretty reliably.

(Sometimes the trace is noisier, especially in quants other than the original.)

This task is pretty simple and I think can be solved easily with the same kind of statistical pattern matching these models use to write other text.


I'll be impressed when you can reliably give them a random four-word phrase for this test. Because I don't think anyone is going to try to teach them all those facts; even if they're trained to know letter counts for every English word (as the other comment cites as a possibility), they'd then have to actually count and add, rather than presenting a known answer plus a rationalization that looks like counting and adding (and is easy to come up with once an answer has already been decided).

(Yes, I'm sure an agentic + "reasoning" model can already deduce the strategy of writing and executing a .count() call in Python or whatever. That's missing the point.)


5 "b"s, not counting the parenthetical at the end.

https://claude.ai/share/943961ae-58a8-40f6-8519-af883855650e

Amusingly, a bit of a struggle with understanding what I wanted with the python script to confirm the answer.

I really don't get why people think this is some huge un-fixable blindspot...


I don't think the salience of this problem is that it's a supposedly unfixable blind spot. It's an illustrative failure in that it breaks the illusory intuition that something that can speak and write to us (sometimes very impressively!) also thinks like us.

Nobody who could give answers as good as ChatGPT often does would struggle so much with this task. The fact that an LLM works differently from a whole-ass human brain isn't actually surprising when we consider it intellectually, but that habit of always intuiting a mind behind language whenever we see language is subconscious and and reflexive. Examples of LLM failures which challenge that intuition naturally stand out.


That indeed looks pretty good. But then why are we still seeing the issue described in OP?


You can already do it with arbitrary strings that aren't in the dictionary. But I wonder if the pattern matching will break once strings are much longer than any word in the dictionary, even if there's plenty of room left in context and all that.


> It also has reasoning tokens to use as scratch space

For GPT 5, it would seem this depends on which model your prompt was routed to.

And GPT 5 Thinking gets it right.


You can even ask it to go letter-by-letter and it'll get the answer right. The information to get it right is definitely in there somewhere, it just doesn't by default.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: