Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The way the authors talk about LLMs really rubs me the wrong way. They spend more of the paper talking up the 'claims' about LLMs that they are going to debunk than actually doing any interesting study.

They came into this with the assumption that LLMs are just a cheap trick. As a result, they deliberately searched for an example of failure, rather than trying to do an honest assessment of generalization capabilities.



What the hype crowd doesn't get is that for most people, "a tool that randomly breaks" is not useful.


The fact that a tool can break or that the company manufacturing that tool lies about its abilities, are annoying but do not imply that the tool is useless.

I experience LLM "reasoning" failure several times a day, yet I find them useful.


>They came into this with the assumption that LLMs are just a cheap trick. As a result, they deliberately searched for an example of failure, rather than trying to do an honest assessment of generalization capabilities.

And lo and behold, they still found a glaring failure. You can't fault them for not buying into the hype.


But it is still dishonest to declare reasoning LLMs a scam simply because you searched for a failure mode.

If given a few hundred tries, I bet I could find an example where you reason poorly too. Wikipedia has a whole list of common failure modes of human reasoning: https://en.wikipedia.org/wiki/List_of_fallacies


Well, given the success rate is no more than 90% in the best cases. You could probably find a failure in about 10 tries. The only exception is o1-preview. And this is just a simple substitution of parameters.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: