Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

In general this is of course an active area of research, but yes, you can do something that and people have done it successfully[1] by adding extra layers to an existing model and then continuing to train it.

You have to be careful about the "same data" part though; ideally you want to train once on unique data[2] as excessive duplication can harm the performance of the model[3], although if you have limited data a couple of training epochs might be safe and actually improve the performance of the model[4].

[1] -- https://arxiv.org/abs/2312.15166

[2] -- https://arxiv.org/abs/1906.06669

[3] -- https://arxiv.org/abs/2205.10487

[4] -- https://galactica.org/static/paper.pdf



In addition to increasing the number of layers, you can also grow the weight matrices and initialize by tiling them with the smaller model's weights https://neurips.cc/media/neurips-2023/Slides/83968_5GxuY2z.p...


Thank you for taking the time to provide me all this reading.


This might be obvious, but just to state it explicitly for everyone: you can freeze the weights of the existing layers if you want to train the new layers but want to leave the existing layers untouched.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: