Hacker News new | past | comments | ask | show | jobs | submit login

Is there an equivalent of LORA using RL instead of supervised fine tuning? In other words, if RL is so important, is there some way for me as an end user to improve a SOTA model with RL using my own data (i.e. without access to the resources needed to train an LLM from scratch) ?



LORA can be used in RL; it's indifferent to the training scheme. LORA is just a way of lowering the number of trainable parameters.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: