Is there an equivalent of LORA using RL instead of supervised fine tuning? In ot... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

anonymousDan 38 days ago | parent | context | favorite | on: Tracing the thoughts of a large language model

Is there an equivalent of LORA using RL instead of supervised fine tuning? In other words, if RL is so important, is there some way for me as an end user to improve a SOTA model with RL using my own data (i.e. without access to the resources needed to train an LLM from scratch) ?

fpgaminer 38 days ago [–]

LORA can be used in RL; it's indifferent to the training scheme. LORA is just a way of lowering the number of trainable parameters.

Join us for AI Startup School this June 16-17 in San Francisco!
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact