Hacker News new | past | comments | ask | show | jobs | submit login

Part of me thinks that’s exactly why this department has been sidelined - having such a department is necessary to create hype (“we are creating things so powerful we have to explore how to contain them”), but it doesn’t need to thrive either.



The real answer is they have realized internally that agi is simply not possible. Even gpt 4o doesn't understand at all any logic.


That's simply not true. We now have LLMs that perform similarly or better than average humans on general reasoning and logic benchmarks. So if you were to say that they don't understand logic at all, then by that definition most humans don't either (which can be debated, but it's a different topic).


GPT4o still can't reason. It is super fancy autocomplete.

https://i.imgur.com/z83umbk.jpeg

Here I change a widely known riddle to the opposite answer, and I manage to make it state them both as the answer.


"Super fancy autocomplete" may not be that different to us, or at least some substantial part of us. Do you think when you speak colloquially with a friend or a colleague, you are engaging in a deep reasoning exercise? When you speak, the next set of words you utter feels like a 'fancy autocomplete' because you don't think through every word, or even the underlying idea or question that was presented - you just know how to respond and with what set of words and sentences.


> Do you think when you speak colloquially with a friend or a colleague, you are engaging in a deep reasoning exercise?

When I speak colloquially, I have an underlying idea rooted in a world model to be expressed. I don't spit out 1 word at a time based on the previous words I already said.


It's a bit more than fancy autocomplete.

It's very much feels to be gearing towards figuring out the gist of a search engine you may be trying to complete and put together by reading a few links.


'Fancy autocomplete' is a thought-terminating cliche.

You can stump a person with a riddle or a logic puzzle or an optical illusion.


"How do you know we're not just next token predictors" is the thought terminating cliche . We know that's what LLMs are. It was certainly eye opening to see how far that gets you. But any deeper claims about intelligence or reasoning need real evidence or at least a proposed line of reasoning. "We don't know how intelligence works so it might be that" doesn't count.


When did I say "How do you know we're not just next token predictors"?

The fact is that people who say 'it is just fancy autocomplete' are using a thought-terminating cliche, and a 'I stumped an LLM' proves nothing.


It’s no more thought-terminating than “intelligence,” which is extremely loaded and causes people to make assumptions about these models that work backwards from the “intelligent” label rather than forwards from the tech itself


So? Stop using thought-terminating cliches.


Also, even if this oft-repeated trump card is true, why are you so sure this is different from how our own brains functionally work?


Because if this is how our brains really worked, then chatgpt wouldn't be beating most humans at standardized tests and then failing this absurdly easy question that even an elementary school kid could pass.

It is simply regurgitating this phrase without even considering that it is stating the exact opposite of the answer it just gave, simply because most answers to this riddle on the internet say this at the end.

> This riddle plays on the assumption that a surgeon is typically male, but in this case, the surgeon is the boy's mother.

So from this 1 failing, you can see that it is a copy and paste machine, and it doesn't even understand that it is contradicting itself.


Your counterexample doesn’t prove that this isn’t how our minds functionally work, except better. No amount of counterexamples could.


> which can be debated, but it's a different topic

No, it "can't be debated," it is clearly false! You said "by definition," but you used an irrational and bigoted definition of "general reasoning and logic" which conflates such things with performance on a standardized test. Humans aren't innately good at stupid logic puzzles that LLMs might get a 71st percentile in. Our brains are not actually designed to solve decontextualized riddles. That's a specialized skill which can be practiced. It's depressing enough when people claim IQ tests are actually good measures of human intelligence, despite overwhelming evidence to the contrary. But now, by even worse reasoning, we have people saying a computer is smarter than "average humans." (MTurk average humans? Undergrads? Who cares!) The complete lack of skepticism and scientific thinking on display by many AI developers/evangelists is just plain depressing.

Let me add that a truly humiliating number of those """general reasoning""" LLM benchmarks are fucking multiple choice questions! Not all of them, but a lot. ML critics have been complaining since ~2017 (BERT) that LLMs pick up on spurious statistical correlations in benchmarks but fail badly in real-world examples that use slightly different language. Using a multiple choice test is simply dishonest, like a middle finger to scientific criticism.


I don't know where you are getting that, but if you give it a simple problem which hasn't been written about on the internet, it gets it wrong. It is cool tech, don't get me wrong, but it is also dumb tech. I do not think we will ever get AGI without understanding consciousness first.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: