Math RLHF already has verifiable ground truth/right vs wrong, so I don't what th... | Hacker News

Hacker Newsnew | past | comments | ask | show | jobs | submit

		meroes 18 days ago \| parent \| context \| favorite \| on: OpenAI claims gold-medal performance at IMO 2025 Math RLHF already has verifiable ground truth/right vs wrong, so I don't what this distinction really shows. And AI changes so quickly that there is a breakthrough every week. Call my cynical, but I think this is an RLHF/RLVR push in a narrow area--IMO was chosen as a target and they hired specifically to beat this "artificial" target.

Davidzheng 17 days ago [–]

RLHF means Reinforcement Learning from Human Feedback. The right/wrong ones are either called RL or RLVR (Verfiable Rewards)

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact