Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I’m interested in this as well. Comparatively little attention has been paid to those 7B model results, but they look quite good against 175B GPT-3.

As for ChatGPT, that is GPT-3.5 (same 175B model, but with instruction fine-tuning), plus the RLHF.



GPT 3.5 likely differs from the original GPT 3 by more than instruction fine-tuning. For example, it was probably retrained under Chinchilla scaling laws [1], with a lot more data and maybe a somewhat smaller parameter count.

There are many variants of GPT-3 and GPT-3.5, and based on the performance numbers in Meta’s paper, it looks like they’re comparing against the very first version of GPT-3 from 2020. [2]

[1] https://arxiv.org/abs/2203.15556

[2] https://arxiv.org/abs/2005.14165




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: