But I think that's the whole point of the exercise? That GPT-4 is leaning on str...

arrrg · on March 14, 2023

But the human (in the comment chain) here made exactly the same mistake!

In that sense this test doesn’t seem to be a good fit for testing the reasoning capabilities. Since it‘s also easy to get wrong for humans (and humans also don’t always reason about everything from first principles, especially if they have similar answers already cached in their memory).

It seems you would need novel puzzles that aren’t really common (even if in kind) and don’t really sound similar to existing puzzles to get a handle on its reasoning capabilities.

colinmorelli · on March 14, 2023

The human recognized that they made the mistake and fixed it. As mentioned in the original comment, GPT failed to recognize the mistake even after being told. That's the key here that indicates it can't "reason."

There are open questions about whether or not it really needs to reason given sufficient training, but that seems to be the gap here between the human and the machine.

og_kalu · on March 14, 2023

Rewrite the problem in way that doesn't make it biased to common priors and it reasons it out just fine

"Suppose I have an object of type C, an object of type G and an object of type L, and I need to get them across a river. I have a boat that can only carry myself and a single other object. I am not allowed to leave the objects of type C and L alone together, and I am not allowed to leave the objects of type L and G alone together. How can I safely get all three across?"

"This is a classic puzzle known as the "river crossing problem". Here's how you can safely get all three objects across the river:

First, take the object of type L across the river. Return alone to the original side of the river. Take the object of type C across the river. Return with the object of type L to the original side of the river. Take the object of type G across the river. Return alone to the original side of the river. Finally, take the object of type L across the river again. By following these steps, you can safely transport all three objects across the river without violating the restrictions given in the problem. Note that you have to start with the object of type L because it cannot be left alone with either of the other two objects. Once L is transported to the other side, you can use it to ensure that C and G are never left alone together."

Or With Bing, you don't even need to tell you what it assumed wrong - I just told it that it's not quite the same as the classic puzzle, and it responded by correctly identifying the difference and asking me if that's what I meant, but forgot that lion still eats the goat. When I pointed that out, it solved the puzzle correctly.

og_kalu · on March 14, 2023

Bing/GPT-4 gets the answer right if you rewrite the problem in way that doesn't make it biased to common priors

Or just tell it it's making a wrong assumption.

colinmorelli · on March 15, 2023

Again, this is not about being able to write the prompt in a way that allows GPT to find the answer. I’m not doubting its ability to do so. It’s that a human can reason through why the answer should be different, despite any common priors, and arrive at the correct judgment.

It indicates that there’s still something a human does that the machine doesn’t, even if we’re not able to place what it is. This is neither an argument for nor against progress towards AGI, just an observation. It’s interesting regardless (to me).

og_kalu · on March 15, 2023

It can do that though..? Kind of the point with the Bing example. I told it it was making a wrong assumption (didn't tell it what was wrong) with it's original answer and it figured it out.

og_kalu · on March 15, 2023

Then again Bing is structured to have an inner monologue...