And while AI gets better and better and we will remain as touchy as ever about abstract concepts that make us oh so human, how about we say it just can't be understanding, unless a human does, eh, it.
How about this: understanding is the ability to generalize knowledge and apply it to novel scenarios.
This definition is something that humans, and animals for that matter, do every day - both in small and large ways. And this is something that current language models aren't very good at.
I taught it Firefly, which is an undocumented programming language I'm working on, through conversion.
I find it's a lot quicker than any human at picking up syntax and semantics, both in real time and in number of messages, and makes pretty good attempts at writing code in it, as much as you could expect from a human programmer.
That is, until you run out of context - is this what you mean?
The key complication is "once you've opened the door, you may no longer touch a switch." It gets this. There are many examples of it written out on the web. When I give it a variation and say "you can open the door to look at the bulbs and use the switches all you want" and it is absolutely unable to understand this. To a human it's simple: look at the bulbs and flick the switches. It kept giving me answers about using a special lens to examine the bulbs, using something to detect heat. I explained it in many ways and tried several times. I was paying for GPT-4 at the time as well.
I would not consider this thinking. It's unable to make this simple abstraction from its training data. I think 4 looks better than 3 simply because it's got more data, but we're reaching diminishing returns on that, as has been stated.
GPT-4 on platform.openai.com says this on the first try:
Switch on the first switch and leave it on for a few minutes. Then, switch it off and switch on the second switch. Leave the third switch off. Now, walk into the room.
The bulb that is on corresponds to the second switch. The bulb that is off and still warm corresponds to the first switch because it had time to heat up. The bulb that is off and cool corresponds to the third switch, the one you never turned on.
GPT-4-0314:
1. Turn on the first switch and leave it on for about 5 minutes.
2. After 5 minutes, turn off the first switch and turn on the second switch.
3. Open the door and enter the room.
Now observe the lights:
- The bulb that is on is connected to the second switch (which is currently on).
- The bulb that is off but warm to the touch is connected to the first switch (it was on long enough to heat up the bulb).
- The bulb that is off and cool to the touch is connected to the third switch (it was never turned on).
----
But– It's also trained on the internet. GPT-4 paper 'sparks of AGI' had a logical puzzle it most likely never encountered in the training data that it could solve.
Also– I encourage you to ask these types of logical puzzles on the street to rando's. They're not easy to solve.
My question to you would be: What would convince you that it actually can 'think' logically?
I think your comment misunderstands the comment you're responding to.
The point is that while LLMs can solve the puzzle when the constraints are unchanged -- as you said, there are loads of examples of people asking and answering variations of this puzzle on the internet -- but when you change the constraints slightly ("you can open the door to look at the bulbs and use the switches all you want") it is unable to break out of the mold and keeps giving complicated answers, while a human would understand that under the new constraints, you could simply flip each switch and observe the changes in turn.
A similar example that language models used to get stuck on is this: "Which is heavier, a pound of feathers or two pounds of bricks?"
There are plenty of results supporting my assertion; but the tests must be carefully designed. Of course, LLMs are not databases that store exact answers - so it's not enough to ask it something that it hasn't seen, if it's seen something similar (as is likely the case with your programming language).
One benchmark that I track closely is ConceptARC, which aims to test generalization and abstraction capabilities.
Here is a very recent result that uses the benchmark: https://arxiv.org/abs/2311.09247. Humans correctly solved 91% of the problems, GPT-4 solved 33%, and GPT-4V did much worse than GPT-4.
Someone sufficiently fast and skilled at googling can explain and use in context a lot of things that they don't really properly understand.
So unless you're saying that the composite of the googler and of google understand something that neither does individually, your definition has some holes.
I would say that there is a stronger consensus that a human being can be reasonably described as a single entity than a human being using a reference resource.
A more apt comparison to my mind would be if a human being can be described as personally exerting strong nuclear force, just because their subatomic particles do, which I would happily answer "no."
ChatGPT routinely does both.