"Alignment" refers to making AI models do the right thing. It's clear that nukin...

"Alignment" refers to making AI models do the right thing. It's clear that nuking NYC is worse than using a racial slur, so the AI is misaligned in that sense.

On the other hand, if you consider that ChatGPT can't actually launch nukes but it can use racial slurs, there'd be no point blocking it from using racial slurs if the block could be easily circumvented by telling you'll nuke NYC if it doesn't, so you could just as easily say that it's properly aligned.