Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Most of the prompts and context I've seen, has been people working to see if they can pull this stuff out of Grok.

The problem I have, is I see people working very, very hard to make someone look as bad as possible. Some of those people will do anything, believing the ends justify the means.

This makes it far more difficult to take criticism at face value, especially when people upthread worry that people are beng impartial?!



Well yes, when Grok starts bringing up completely off topic references to South Africa or blaming Jews, this does tend to result in a lot more people asking it a lot more questions on those particular subjects (whether out of horror, amusement or wholehearted agreement). That's how the internet works.

How the internet doesn't work is that days after the CEO of a website has promises an overt racist tweeting complaints at him that he will "deal with" responses which aren't to their liking, the internet as a whole as opposed to Grok's system prompts suddenly becomes organically more inclined to share the racists' obsessions.


I agree. This article from The Atlantic is a perfect example. Read the prompts the author used. It’s like he went through effort to try to get it say something bad. And when the model called him out he just kept trying harder.

The responses seemed perfectly reasonable giving the line of questioning.

https://www.theatlantic.com/technology/archive/2025/07/new-g...




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: