Per the article, the critic for the critic is human RLHF trainers. More specifically those humans are exploited third world workers making between $1.32 and $2 an hour, but OpenAI would rather you didn't know about that.
OpenAI may well still be employing plenty of people in third world countries for this. But there are also contracts providing anywhere from $20 to $100+ an hour to do this kind of work for more complex prompt/response pairs.
I've done work on what (at least to my belief) is the very high end of that scale (not for OpenAI) to fill gaps, so I know firsthand that it's available, and sometimes the work is complex enough that a single response can take over an hour to evaluate because the requirements often include not just reading and reviewing the code, but ensuring it works, including fixing bugs. Most of the responses then pass through at least one more round of reviews of the fixed/updated responses. One project I did work on involved 3 reviewers (none of whom were on salaries anywhere close to the Kenyan workers you referred to) reviewing my work and providing feedback and a second pass of adjustments. So four high-paid workers altogether to process every response.
Of course, I'm sure plenty lower-level/simpler work had been filtered out to be addressed with cheaper labour, but I wouldn't be so sure their costs for things like code is particularly low.
Exploited? Are you saying that these employees are forced to work for below market rates, and would be better off with other opportunities available to them? If that's the case, it's truly horrible on OpenAI's part.
A human reviewer might have trouble catching a mistake, but they are generally pretty good at discerning a report about a mistake is valid or not. For example, finding a bug in a codebase is hard. But if a junior sends you a code snippet and says "I think this is a bug for xyz reason", do you agree? It's much easier to confidently say yes or no. So basically it changes the problem from finding a needle in a haystack to discerning if a statement is a hallucination or not.
We are not assuming that. The iteration happens by taking the report and passing it to another reviewer who reviews the first review. Their comparison is between a human reviewer passing reports to a human reviewer vs. CriticGPT -> human reviewer vs. CriticGPT+human reviewer -> human reviewer.
How do they know it's better? The rate of mistakes is the same for both GPTs so now they have 2 sources of errors. If the error rate was lower for one then they could always apply it and reduce the error rate of the other. They're just shuffling the deck chairs and hoping the boat with a hole goes a slightly longer distance before disappearing completely underwater.
Whether adding unreliable components increases the overall reliability of a system depends on whether the system requires all components to work (in which case adding components can only make matters worse) or only some (in which case adding components can improve redundancy and make it more likely that the final result is correct).
In the particular case of spotting mistakes made by ChatGPT, a mistake is spotted if it is spotted by the human reviewer or by the critic, so even a critic that makes many mistakes itself can still increase the number of spotted errors. (But it might decrease the spotting rate per unit time, so there are still trade-offs to be made.)
I see what you're saying so what OpenAI will do next is create an army of GPT critics and then run them all in parallel to take some kind of quorum vote on correctness. I guess it should work in theory if the error rate is small enough and adding more critics actually reduces the error rate. My guess is that in practice they'll converge to the population average rate of error and then pat themselves on the back for a job well done.
"In our experiments a second random trainer preferred critiques from the Human+CriticGPT team over those from an unassisted person more than 60% of the time."
Of course the second trainer could be wrong, but when the outcome tilts 60% to 40% in favour of the *combination of a human + CriticGPT that's pretty significant.
From experience doing contract work in this space, it's common to use multiple layers of reviewers to generate additional data for RLHF, and if you can improve the output from the first layer that much it'll have a fairly massive effect on the amount of training data you can produce at the same cost.