Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What's your source for any of that? I think the $6 million thing was identified as a lie they felt was necessary because of GPU export laws.


It wasn't a lie, it was a misrepresentation of the total cost. It's not hard to calculate the cost of the training though. It takes 6 * active parameters * tokens flops[1]. To get number of seconds you can divide by Flops/s * MFU, where MFU is around 45% for H100 for large enough models[2].

[1]: https://arxiv.org/abs/2001.08361

[2]: https://github.com/facebookresearch/lingua


That paper's 5 years old at this point, dating back to when Amodei was still an OpenAI employee. Has any newer work superseded it, or are those assumptions still considered solid?


Those assumptions are still the same. Although now context length has increased more so the n^2 part is non negligible. See the repo for correct flop calculation[1]

[1]: https://github.com/facebookresearch/lingua/blob/437d680e5218...




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: