Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What's _online_ RL for an LLM? Saw this on the llama 3.3 reports too...


Online RL for LLMs means you are sampling from the model, scoring immediately, and passing gradients back to the model.

As opposed to, sampling from the model a bunch, getting scores offline, and then fine tuning the model on those offline scored generations.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: