As a simple example, if you ask a question and part of the answer is directly quoted from a book from memory, that text is not computed/reasoned by the AI and so doesn't have an "explanation".
But I also suspect that any AGI would necessarily produce answers it can't explain. That's called intuition.
It wouldn't be a reference; "explanation" for an LLM means it tells you which of its neurons were used to create the answer, ie what internal computations it did and which parts of the input it read. Their architecture isn't capable of referencing things.
What you'd get is an explanation saying "it quoted this verbatim", or possibly "the top neuron is used to output the word 'State' after the word 'Empire'".
Of course the AI could incorporate web search, but then what if the explanation is just "it did a web search and that was the first result"? It seems pretty difficult to recursively make every external tool also explainable…
Then you should have a stronger notion of "explanation". Why were these specific neurons activated?
Simplest example: OCR. A network identifying digits can often be explained as recognizing lines, curves, numbers of segments etc.. That is an explanation, not "computer says it looks like an 8"
But can humans do that? If you show someone a picture of a cat, can they "explain" why is it a cat and not a dog or a pumpkin?
And is that explanation the way how they obtained the "cat-nes" of the picture, or do they just see that it is a cat immediately and obviously and when you ask them for an explanation they come up with some explaining noises until you are satisfied?
Wild cat, house cat, lynx,...? Sure, they can. They will tell you about proportions, shape of the ears, size as compared to other objects in the picture etc.
For cat vs pumpkin they will think you are making fun of them, but it very much is explainable. Though now I am picturing a puzzle about finding orange cats in a picture of a pumpkin field.
> They will tell you about proportions, shape of the ears, size as compared to other objects in the picture etc.
But is that how they know that the image is a cat, or is that some after the fact tacked on explaining?
Let me tell you an example to better explain what I mean. There are these “botanical identifying” books. You take a speciment unknown to you and and it asks questions like “what shape the leaves are?” “Is the stem woody or not?” “How many petals on the flower?” And it leads you through a process and at the end gives you ideally the specific latin name of the species. (Or at least narrows it down.)
Vs the act of looking at a rose and knowing without having to expend any further energy that it is a rose. And then if someone is questioning you you can spend some energy on counting petals, and describing leaf shapes and find the thorns and point them out and etc.
It sounds like most people who want “explainable AI” want the first kind of thing. The blind and amnesiac botanist with the plant identifying book. Vs what humans are actually doing which is more like a classification model with a tacked on bulshit generator to reason about the classification model’s outputs into which it doesn’t actually have any in-depth insight.
And it gets worse the deeper you ask them. How do you know that is an ear? How do you know its shape? How do you know the animal is furry?
Shown a picture of a cloud,
why it looks like a cat does sometimes need an explanation until others can see the cat, and it's not just "explaining noises".
When people talk about explainability I immediately think of Prolog.
A Prolog query is explainable precisely because, by construction, it itself is the explanation. And you can go step by step and understand how you got a particular result, inspecting each variable binding and predicate call site in the process.
Despite all the billions being thrown at modern ML, no one has managed to create a model that does something like what Prolog does with its simple recursive backtracking.
So the moral of the story is that you can 100% trust the result of a Prolog query, but you can't ever trust the output of an LLM. Given that, which technology would you rather use to build software on which lives depend on?
And which of the two methods is more "artificially intelligent"?
Neural networks can encode any computable function.
KANs have no advantage in terms of computability. Why are they a promising pathway?
Also, the splines in KANs are no more "explainable" than the matrix weights. Sure, we can assign importance to a node, but so what? It has no more meaning than anything else.