Hacker News new | past | comments | ask | show | jobs | submit login

One finding in the LLaMA paper [1] is that our current large models are undertrained. LLaMA with 13B params outperforms GPT-3 175B (not ChatGPT), but an "instruct" version of LLaMA was finetuned over the 65B model and did quite well.

[1] https://arxiv.org/pdf/2302.13971.pdf




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: