Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Well, first of all, the stated purpose of RLHF isn't to "improve model accuracy" in the first place (and what we mean by accuracy here is pretty fraught by itself, as this could mean at least three different things). They initially pitched it as a "safety" measure (and I think if it wasn't obvious immediately how nonsensical a claim that is, it should at least be apparent now that the company's shucked nearly the entire subset of its members that claimed to care about "AI safety" that this is not a priority)

The idea of RLHF as a mechanism for tuning models based on the principle that humans might have some hard-to-capture insight that could steer them independent of the way they're normally trained is the very best steelman for its value I could come up with. This aim is directly subverted by trying to use another language model to influence the human rater, so from my perspective it really brings us back to square one on what the fuck RLHF is supposed to be doing

Really, a lot of this comes down to what these models do versus how they are being advertised. A generative language model produces plausible prose that follows from the prompt it receives. From this, the claim that it should write working code is actually quite a bit stronger than the claim that it should write true facts, because plausibile autocompletion will learn to mimic syntactic constraints but actually has very little to do with whether something is true, or whatever proxy or heuristic we may apply in place of "true" when assessing information (supported by evidence, perhaps. Logically sound, perhaps. The distinction between "plausible" and "true" is in many ways the whole point of every human epistemology). Like if you ask something trained on all human writing whether the Axis or the Allies won WWII, the answer will depend on whether you phrased the question in a way that sounds like Phillip K Dick would write it. This isn't even incorrect behavior by the standards of the model, but people want to use these things like some kind of oracle or to replace google search or whatever, which is a misconception about what the thing does, and one that's very profitable for the people selling it



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: