These models infer semantic categories that correlate to categories within the human mind, to the extent that they can solve natural language understanding tasks.
No one is saying they are biological neurons, or that they model semantics exactly as the human mind would. It is mechanical pattern recognition that approximates our understanding.
You can browse those artificial neurons online and view their associations.
You're just saying words without ever explaining why. What am I supposed to do about that? There's nothing to argue with if you're just repeating nonsensical claims without even trying to support them.
For example:
>> It is mechanical pattern recognition that approximates our understanding.
That's just a claim and you're not even saying why you make it, what makes you think so, etc.
> That's just a claim and you're not even saying why you make it, what makes you think so, etc.
Mechanical - it is an algorithm, not a living being.
Pattern recognition - a branch of machine learning that focuses on the detection and identification of regularities and patterns in data. It involves classifying or categorizing input data into identifiable classes based on extracted features. The patterns recognized could be in various forms, such as visual patterns, speech patterns, or patterns in text data.
Approximates our understanding - meaning the model is not exactly the same as human understanding
When I say 'mechanical pattern recognition that approximates our understanding,' what I mean is that large language models (LLMs) like GPT-4 learn patterns from the vast amounts of text data they're trained on. These patterns correspond to various aspects of language and meaning.
For example, the models learn that the word 'cat' often appears in contexts related to animals, pets, and felines, and they learn that it's often associated with words like 'meow' or 'fur'. In this sense, the model 'understands' the concept of a cat to the extent that it can accurately predict and generate text about cats based on the patterns it has learned.
This isn't the same as human understanding, of course. Humans understand cats as living creatures with certain behaviors and physical characteristics, and we have personal experiences and emotions associated with cats. A language model doesn't have any of this - its 'understanding' is purely statistical and based on text patterns.
The evidence for these claims comes from the performance of these models on various tasks. They can generate coherent, contextually appropriate text, and they can answer questions, translate languages, and perform other language-related tasks with a high degree of accuracy. All of this suggests that they have learned meaningful patterns from their training data.
That is not "evidence" of anything. It's just assumptions. You keep saying what you think is going on without ever saying how or why. You are not describing any mechanisms and you are not explaining any observations.
I have a suggestion: try to convince yourself that you are wrong; not right. Science gives you the tools to know when you're wrong. If you're certain you're right about something then you're probably wrong and you should keep searching until you find where and how.
For example, try to trace in your mind the mechanisms and functionality of language models, and see where your assumptions about their abilities come from.
Your suggestion of trying to convince oneself of being wrong is a valuable one and reflects the scientific method. I agree that it's important to continually challenge and scrutinize our own beliefs and assumptions.
Let's delve deeper into the mechanics of language models. Large language models like GPT-4 use an architecture called transformers. This architecture is composed of layers of self-attention mechanisms, which allow the model to weigh the importance of each word in the input when predicting the next word.
When the model is trained, it adjusts the weights in its network to minimize the difference between its predictions and the actual words in its training data. This process is guided by a loss function and an optimization algorithm.
Through this training process, the model learns to represent words and phrases as high-dimensional vectors, also known as embeddings. These embeddings capture many aspects of the words' meanings, including their syntactic roles and their semantic similarities to other words.
When the model generates text, it uses these embeddings to choose the most likely next word given the previous words. This process is based on the patterns and regularities that the model has learned from its training data.
Of course, this is a high-level description and the actual process involves a lot of complex mathematics and computation. But I hope it gives you a better sense of the mechanisms behind these models.
As for evidence, there are numerous studies that have evaluated these models on a wide range of tasks, including text generation, question answering, translation, and more. These studies consistently show that these models perform well on these tasks, often achieving state-of-the-art results. This is empirical evidence that supports the claim that these models have learned meaningful patterns from their training data.
I agree that we should always remain skeptical and open to new evidence and alternative explanations. I welcome any specific criticisms or alternative hypotheses you might have about these models and their capabilities.
>> Of course, this is a high-level description and the actual process involves a lot of complex mathematics and computation. But I hope it gives you a better sense of the mechanisms behind these models.
For the record, I just polished off a PhD in AI (symbolic machine learning) after a Master's where I studied neural nets and NLP, including plenty of language generation. You're teaching your grandma to suck eggs.
And I'm really very tired with this kind of conversation that never teaches me anything new. Your comment is still "what"'s all the way down. You never explain why or how word embeddings capture aspects of meaning, you 're just repeating the claims by Mikolov or whoever. Look, here:
>> Through this training process, the model learns to represent words and phrases as high-dimensional vectors, also known as embeddings. These embeddings capture many aspects of the words' meanings, including their syntactic roles and their semantic similarities to other words.
That's just a claim, made long ago, and challenged at the time, and the challenge ignored, and it keeps being bandied about as some kind of scientific truth just because critics got tired or bored having their criticims consistently ignored and gave up trying.
This is what I point out above: connectionists never stop to consider criticism of the limitations of their systems' until someone rubs their face in it - like Minsky and Pappert did in 1969, which then caused them to be forever reviled and accused of causing an AI winter, when what they really caused was some connectionists to get off their butts and try to improve their work, a process without which we wouldn't, today, have backpropagation applied to NNs, and the potent image classifiers, good models of text, etc, that it enabled.
As to the "evidence" you profer, mainly preprints on arxiv, and mainly consisting of budding researchers uploading papers consisting of little more than leaderboards (those little tables with the systems on one side, the datasets on the other side, and your system's results in bold, or no paper) those are useless. 99% of research output in deep learning and neural nets is completely useless and never goes anywhere- because it lacks novelty, it is completely devoid of any theoretical results, and it is unreproducible even when the code is available.
For example, you mention studies on "question answering". Ca. 2018 Google published a paper where they reported that their BERT language model scored near-human performance on some question answering dataset without ever even having been trained on question answering. A scientific miracle! Some boffins who clearly don't believe in miracles wondered why that would even be possible and dug a bit, and found that BERT was overfitting to surface statisical regularities of its dataset. They created a new test dataset devoid of such statistical regularities and BERT's performance went down the drain, until it hit rock bottom (a.k.a. "no better than chance"). So much for "semantic similarity" measured over word embeddings modelling meaning.
But this is exactly the kind of work that I say connectionists consistently ignore: nowhere will you find that subsequent language models were tested in the same way. You will instead find plenty of tests "demonstrating" the ability of language models to represent semantics, meaning, etc. It's all bullshit, self-delusion at best, conscious fabrications otherwise.
This is the paper (I'm not affiliated with it in any way):
Probing Neural Network Comprehension of Natural Language Arguments
But this kind of work is thankless for the undertaking academics and most of us have more important things to do. So the criticism eventually dwindles and what remains is the bullshit, and the fabrications, and the fantasies, seeping into mainstream discourse and being repeated uncritically - by yourself, for example. I can't even summon the compassion to not blame you anymore. For all I know you're exactly one of those connectionists who don't even understand their work is not science anymore, but spectacle.
P.S. I am not blind to the change of tone in your recent comments and I'm really sorry to be so cranky in response, when I should be cordial in reciprocity, but I've really had enough of all this. Unscientific bullshit has permeated everything and oozed everywhere. Perhaps it's time for me to take a break from HN, because it really doesn't look like I can have an original, curious conversation on here anymore.
I understand that this discussion can become frustrating, especially when you see repetitive patterns in the discourse or feel like the nuances are not being sufficiently addressed. However, there are a few points I would like to clarify:
Semantics in word embeddings: While I agree that word embeddings cannot fully capture human-like semantic understanding, they do provide a mathematical representation that has proven useful in many NLP tasks. It's not that word embeddings "understand" semantics in the human sense, but they do capture certain aspects of meaning that are statistically derived from their use in the training corpus. This is not an unsubstantiated claim. It is empirically demonstrated in numerous tasks where semantic understanding is beneficial, like semantic similarity, word analogy, and other downstream tasks such as translation, sentiment analysis, text classification, etc.
Your point about BERT overfitting to statistical regularities of the dataset is well taken. Indeed, it exposes the limitations of the model and the need for careful design and evaluation of benchmarks. However, it's worth noting that a failure in one specific test doesn't invalidate the successes in other tasks. It simply highlights an area that needs improvement.
It's true that there's a flood of papers and not all of them have substantial novelty or impact. This is not a problem exclusive to deep learning or AI, but a broader issue in academia and scientific publishing. However, amidst the noise, there's also a lot of valuable work being done, with genuine advancements and novel approaches.
You mentioned that connectionists only improve their systems when someone rubs their face in it. This is essentially how scientific progress happens - through skepticism, criticism, and the relentless pursuit of truth. I would argue that the current era of deep learning research is no different. It's a messy, iterative process, with steps forward, backward, and sideways.
Furthermore, I believe it's crucial to remember that there's room for both connectionist and symbolic approaches in AI. It's not necessarily a matter of one being 'right' and the other 'wrong.' Rather, they offer different perspectives and techniques that can be valuable in different contexts. Connectionist models, like the neural networks we've been discussing, are incredibly effective at tasks like pattern recognition and prediction, especially when dealing with large, high-dimensional datasets. On the other hand, symbolic models are excellent at representing explicit knowledge and reasoning logically, making them useful for tasks that require a high degree of interpretability or strict adherence to predefined rules. The future of AI likely involves finding ways to integrate these two approaches, leveraging the strengths of each to overcome their respective limitations. The field is vast and diverse, and there's plenty of room for different methods and viewpoints.
PS: I understand where you're coming from. Sometimes I need a break from this too. Remember there is no malicious intent here when people are just sharing their views.
Your disagreement seems to be a philosophical one. It is not a technical argument. It seems that you won't accept that semantics can be modelled by an unconscious mechanical system. I am talking about mathematical concepts of semantics, not "true" human semantics that are the product of human insight and consciousness. https://en.wikipedia.org/wiki/Semantic_similarity
While AI doesn't have an innate understanding of the world as humans do, the semantic representations it learns from vast amounts of text data can be surprisingly rich and detailed. It can capture associations and nuances that are not immediately apparent from a purely syntactic analysis of the text.
Oh come on. "Semantic similarity" is just heuristic bullshit. It's not a scientific term, or even a mathematical concept. Don't try to pull rank on me without even knowing who I am or what I do just because you can read wikipedia.
And note you're still not saying "why" or "how", only repeating the "what" of someone else's claim.
I understand your skepticism, and I acknowledge that the concept of semantic similarity is indeed an approximation. However, it is an approximation that has proven highly useful in a wide range of practical applications.
Semantic similarity methods are based on the idea that the meaning of a word can be inferred from its context, which is a concept known as distributional semantics. In essence, words that occur in similar contexts tend to have similar meanings. This is not just a heuristic, it's a well-established principle in linguistics, known as the distributional hypothesis.
In the case of large language models, they are trained on vast amounts of text data and learn to predict the next word in a sentence given the previous words. Through this process, they learn to represent words as high-dimensional vectors (word embeddings) that capture many aspects of their meaning, including their semantic similarity to other words.
These models can generate coherent text, answer questions, translate languages, and perform other language-related tasks with a high degree of accuracy. These capabilities wouldn't be possible if the models were only capturing syntax and not semantics.
The 'why' is because these models learn from the statistical regularities in their training data, which encode both syntactic and semantic information. The 'how' is through the use of deep learning algorithms and architectures like transformers, which allow the models to capture complex patterns and relationships in the data.
I hope this provides a more detailed explanation of my argument. I'm not trying to 'pull rank', but simply explaining the basis for my claims. I understand this is a complex topic, and I appreciate your challenging questions as they help clarify and deepen the discussion.
https://en.wikipedia.org/wiki/Artificial_neuron
These models infer semantic categories that correlate to categories within the human mind, to the extent that they can solve natural language understanding tasks.
No one is saying they are biological neurons, or that they model semantics exactly as the human mind would. It is mechanical pattern recognition that approximates our understanding.
You can browse those artificial neurons online and view their associations.