What's your source for any of that? I think the $6 million thing was identified ...

YetAnotherNick · 2025-08-08T22:30:19 1754692219

It wasn't a lie, it was a misrepresentation of the total cost. It's not hard to calculate the cost of the training though. It takes 6 * active parameters * tokens flops[1]. To get number of seconds you can divide by Flops/s * MFU, where MFU is around 45% for H100 for large enough models[2].

[1]: https://arxiv.org/abs/2001.08361

[2]: https://github.com/facebookresearch/lingua

CamperBob2 · 2025-08-09T02:40:10 1754707210

That paper's 5 years old at this point, dating back to when Amodei was still an OpenAI employee. Has any newer work superseded it, or are those assumptions still considered solid?

YetAnotherNick · 2025-08-09T08:36:57 1754728617

Those assumptions are still the same. Although now context length has increased more so the n^2 part is non negligible. See the repo for correct flop calculation[1]

[1]: https://github.com/facebookresearch/lingua/blob/437d680e5218...