This is why, whenever I can, I call RLHF/DPO "sequence level calibration" instea... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

boodleboodle 40 days ago | parent | context | favorite | on: Tracing the thoughts of a large language model

This is why, whenever I can, I call RLHF/DPO "sequence level calibration" instead of "alignment tuning".

Some precursors to RLHF: https://arxiv.org/abs/2210.00045 https://arxiv.org/abs/2203.16804

Consider applying for YC's Summer 2025 batch! Applications are open till May 13
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact