Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The hallucinations are a result of RLVR. We reward the model for an answer and then force it to reason about how to get there when the base model may not have that information.


> The hallucinations are a result of RLVR

Well let us reward them for producing output that is consistent with database accessed selected documentation then, and massacre them for output they cannot justify - like we do with humans.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: