> First, we evaluate, for each voxel, subject and narrative independently, whether the fMRI responses can be predicted from a linear combination of GPT-2’s activations (Fig. 1A). We summarize the precision of this mapping with a brain score M: i.e. the correlation between the true fMRI responses and the fMRI responses linearly predicted, with cross-validation, from GPT-2’s responses to the same narratives (cf. Methods).
Was this cross checked against arbitrary Inputs to GPT-2? I gather, with 1.5 Billion parameters, you can find a representative linear combination for everything.
They assume linearity. They map their choice of GPT-properties to their choice of (brain blood flow!) properties. They then claim there are correlations, with a few fMRI datasets.
If something serious was on the line, with this type of analysis, you'd be fired.
Reading this it feels like we might as well give up on there being any science any more, tbh. For this to appear in Nature -- it feels like the rubicon has been crossed.
How can we expect the public not to be "anti-vax" (etc.), or otherwise scientifically competent in the basic tennets of modern science (experiment, refutation, peer review) -- if Nature isnt?
It's not Nature, it's Scientific Reports. The bar to publication in the two couldn't be more different. Nature is one of the premier high impact journals, Sci. Rep. is a pretty middle of the road somewhat new open access journal.
> Scientific Reports is an online peer-reviewed open access scientific mega journal published by Nature Portfolio, covering all areas of the natural sciences. The journal was launched in 2011.[1] The journal has announced that their aim is to assess solely the scientific validity of a submitted paper, rather than its perceived importance, significance or impact.[2]
Of that last line, this is quite literally the opposite. The only grounds to accept this paper is how on-trend this topic is. The "scientific validity" of correlating floating point averages over historical text documents, and brain blood flow... is, c. 0%
This just is a crystallization all the pseudoscience trends of the last (> decade): associative statistical analysis; assuming linearity; reification fallacy; failure to construct relevant hypotheses to test; no counterfactual analysis; no series attempt at falsification; trivial sample sizes; profound failure to provide a plausible mechanism; profound failure to understand the basic theory in the relevant domains; "AI"; "Neural"; "fMRI"; etc.; paper participates in a system of financial incentives largely benefitting industrial companies with investment in relevant tech; paper is designed to be a press release for those companies.
If I were to design and teach a lecture series on contemporary pseudoscience, I'd be half-inclined to spend it all on this paper alone. It's a spectacular confluence of these trends.
I work in neuroscience and pharmacology. My impression of my own field is far different than what you state here. You made a statement about all scientific exploration but you seem to only read about a few limited areas
I happen to be BS-facing, it must be said. I ought calm myself with the vast amount of "normal science".
But likewise, we're in an era when "the man on the street" feels easy appealing to "the latest paper" delivered to him via an aside in a newspaper.
And at the same time, the "scientific" industry which produces this papers seems to have not merely taken the on-trend funding, but scarified its own methods to capture it.
In otherwords, "the man on the street" seems to have become the target demographic for a vast amount of science. From pop-psych to this, all designed to dazzle the lay reader.
Once only on popsci book shelves, now, everywhere in Nature!
> To this end, we analyze 101 subjects recorded with functional Magnetic Resonance Imaging while listening to 70 min of short stories. We then fit a linear mapping model to predict brain activity from GPT-2’s activations. Finally, we show that this mapping reliably correlates (R=0.50,p<10−15) with subjects’ comprehension scores as assessed for each story.
Note that this is exactly the wrong way to form and attempt to refute a scientific hypothesis. The authors don't start with some new observations that require explanation, they start with a hypothesis already fully-formed ("...these models encode information that relates to human comprehension..."), and then go out and collect observations to confirm this hypothesis.
I'm sure that if asked, the authors would say that they are simply trying to answer a scientific question, but it's obvious that they already have the answer they want and they're just trying to find data to support it. The problem of course is that if one is already convinced of the answer, one can always find evidence to "prove" it. It's a kind of confirmation bias.
Maybe I'm missing something, but I don't see how this correlation shows that the mapping is semantic and not, say, grammatical, syntactic, or structural in some way.
This is so bizarr, I wouldn't even know how to inspect, verify and critique the claims made. Also there is no subject in such simple algorithms, very relaxed use of big words.
101 subjects seems like not that much data to establish a correlation between GPT activations and brain activations, a correlation between brain activations and reading comprehension, and then chain them together to get an overall correlation.
As a manager I could hook it up to my corporate performance system and use ability to comprehend boring meeting content as a performance metric. It’ll be awesome, you’ll see.
Don’t get too far ahead of the results, but yeah, this makes things extremely interesting.
From the paper:
”However, whether these models encode, retrieve and pay attention to information that specifically relates to behavior in general, and to comprehension in particular remains controversial”
edit: Mapping from a statistical model directly to one specific individual’s brain may turn out to be intractable for things like brain implants.
But, the chances are pretty good that there will be strides made in unexpected areas.
Ah, this is truly beautiful - neuropsychology and artificial neural networks making connections.
From the paper: ”These advances raise a major question: do these algorithms process language like the human brain? Recent studies suggest that they partially do: the hidden representations of various deep neural networks have shown to linearly predict single-sample fMRI, MEG, and intracranial responses to spoken and written texts.”
Not accurate. There are journals that try to evaluate impact and there are journals that focus purely on scientific validity. (PLOS One is a journal that has become famous for doing the latter.) The passage you quote is just the journal signaling that it's in the second camp. It doesn't mean the peer review is any lighter than at impact-focused journals.
It's a good thing for science that not all journals are impact chasers. Scientists are by definition not perfectly reliable evaluators of impact, because science is about exploring the unknown. Publishing work that's only passed a technically focused peer review allows for unexpected impact.
Maybe, but we should treat them like arXiv.org type e-Print archive. People are posting them to HN and thinking that because it's Nature.com site it's solid science.
I've published two papers in Scientific Reports and have had several more rejected—definitely not my experience. Peer review was arduous and on par with any other major journal with similar impact factor.
From the paper: ”Specifically, we show that GPT-2’s mapping correlates with comprehension up to R=0.50. This result is both promising and limited: on the one hand, we reveal that the similarity between deep nets and the brain non-trivially relates to a high-level cognitive process. On the other hand, half of the comprehension variability remains unexplained by this algorithm.”
Here, we show that the representations of GPT-2 not only map onto the brain responses to spoken stories, but they also predict the extent to which subjects understand the corresponding narratives. To this end, we analyze 101 subjects recorded with functional Magnetic Resonance Imaging while listening to 70 min of short stories. We then fit a linear mapping model to predict brain activity from GPT-2’s activations. Finally, we show that this mapping reliably correlates (R=0.50,p<10−15) with subjects’ comprehension scores as assessed for each story.
I don't understand your reply or why you downvoted. I said this as genuine grammar feedback that I assume most people would accept kindheartedly and fix.
When I originally read the comment I had no idea if it was a genuine comment, some kind of summary, or a legit quote from the article. Quotation marks exist for a reason and would have prevented this (and still can: it is possible to edit a comment). I was trying to be helpful, not negative in any way.