Preventing LLMs from Spreading False Health Information

ko_pivot · on May 27, 2024

This is legitimate research but in the medium term, refusal is going to be irrelevant. It is very likely that open weight models continue to be no more than a year or two behind the cutting edge, and it is also likely that established techniques for 'reversing' safety RLHF via limited fine tuning get better and better. So, like it or not, bad and naive actors will have access to all the generation capability they need.