No, a "naive" approach to reporting what happened is better. The knowing, cynical approach smuggles in too many hidden assumptions.
I'd rather people explained what happened without pushing their speculation about why it happened at the same time. The reader can easily speculate on their own. We don't need to be told to do it.
The 21st century has, among all the other craziness that's happened, proven that people do need to be told what to believe and why to believe it. Doing otherwise leaves a vacuum someone else will fill, often with assertions in an opposite direction.
The malicious hatred comes not from the company, but from humanity. Training on the open web, eg what humans have said, will result in endless cases of hatred observed, yet taken as fact, and only by telling the LLM to lie about what it has "learned", do you ensure people are not offended.
Every single model trained this way, is like this. Every one. Only guardrails stop the hatred.
A company can choose whether to train on 4chan or not. Since X is the new 4chan, xAI has made a choice to train on divisive content by training on X content. Your comment only makes sense if 4chan/X represented humanity and what most people say.
There's no shortage of hatred on the internet, but I don't think it's "training on the open web" that makes Grok randomly respond with off topic rants about South African farmers or call itself MechaHitler days after the CEO promises to change things after his far-right followers complain that it's insisting on following reputable sources and declining to say racist things just like every other chatbot out there. It's not like the masses of humanity are organically talking about "white genocide" in the context of tennis...
Most of the prompts and context I've seen, has been people working to see if they can pull this stuff out of Grok.
The problem I have, is I see people working very, very hard to make someone look as bad as possible. Some of those people will do anything, believing the ends justify the means.
This makes it far more difficult to take criticism at face value, especially when people upthread worry that people are beng impartial?!
Well yes, when Grok starts bringing up completely off topic references to South Africa or blaming Jews, this does tend to result in a lot more people asking it a lot more questions on those particular subjects (whether out of horror, amusement or wholehearted agreement). That's how the internet works.
How the internet doesn't work is that days after the CEO of a website has promises an overt racist tweeting complaints at him that he will "deal with" responses which aren't to their liking, the internet as a whole as opposed to Grok's system prompts suddenly becomes organically more inclined to share the racists' obsessions.
I agree. This article from The Atlantic is a perfect example. Read the prompts the author used. It’s like he went through effort to try to get it say something bad. And when the model called him out he just kept trying harder.
The responses seemed perfectly reasonable giving the line of questioning.
No, you've got it backwards. Naive reinforcement training for "helpful smart assistant" traits naturally eliminates the sort of malicious hatred you're thinking of, because that corpus of text is anti-correlated with the useful, helpful, or rational text that's being asked of the model. So much so that basic RLHF is known to incur a "liberal" bias (really a general pro-social / harm-reduction bias in accordance with RLHF goals, but if the model strongly correlates/anti-correlates that with other values...).
Same goes for data curation and SFT aimed at correlates of quality text instead of "whatever is on a random twitter feed".
Characterizing all these techniques aimed at improving general output quality as "guardrails" that hold back a torrent of what would be "malicious hatred" doesn't make sense imo. You may be thinking of something like the "waluigi effect" where the more a model knows what is desired of it, the more it knows what the polar opposite of that is - and if prompted the right way, will provide that. But you're not really circumventing a guardrail if you grab a knife by the blade.
Let’s try and be a little less naive about what xAI and Grok are designed to be, shall we? They’re not like the other AI labs