> which can be debated, but it's a different topic
No, it "can't be debated," it is clearly false! You said "by definition," but you used an irrational and bigoted definition of "general reasoning and logic" which conflates such things with performance on a standardized test. Humans aren't innately good at stupid logic puzzles that LLMs might get a 71st percentile in. Our brains are not actually designed to solve decontextualized riddles. That's a specialized skill which can be practiced. It's depressing enough when people claim IQ tests are actually good measures of human intelligence, despite overwhelming evidence to the contrary. But now, by even worse reasoning, we have people saying a computer is smarter than "average humans." (MTurk average humans? Undergrads? Who cares!) The complete lack of skepticism and scientific thinking on display by many AI developers/evangelists is just plain depressing.
Let me add that a truly humiliating number of those """general reasoning""" LLM benchmarks are fucking multiple choice questions! Not all of them, but a lot. ML critics have been complaining since ~2017 (BERT) that LLMs pick up on spurious statistical correlations in benchmarks but fail badly in real-world examples that use slightly different language. Using a multiple choice test is simply dishonest, like a middle finger to scientific criticism.
No, it "can't be debated," it is clearly false! You said "by definition," but you used an irrational and bigoted definition of "general reasoning and logic" which conflates such things with performance on a standardized test. Humans aren't innately good at stupid logic puzzles that LLMs might get a 71st percentile in. Our brains are not actually designed to solve decontextualized riddles. That's a specialized skill which can be practiced. It's depressing enough when people claim IQ tests are actually good measures of human intelligence, despite overwhelming evidence to the contrary. But now, by even worse reasoning, we have people saying a computer is smarter than "average humans." (MTurk average humans? Undergrads? Who cares!) The complete lack of skepticism and scientific thinking on display by many AI developers/evangelists is just plain depressing.
Let me add that a truly humiliating number of those """general reasoning""" LLM benchmarks are fucking multiple choice questions! Not all of them, but a lot. ML critics have been complaining since ~2017 (BERT) that LLMs pick up on spurious statistical correlations in benchmarks but fail badly in real-world examples that use slightly different language. Using a multiple choice test is simply dishonest, like a middle finger to scientific criticism.