They show Llama 3.2 1B with chain-of-thought that outperforms Llama 3.1 8B and 3... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

vlovich123 39 days ago | parent | context | favorite | on: Open source inference time compute example from Hu...

They show Llama 3.2 1B with chain-of-thought that outperforms Llama 3.1 8B and 3.2 3B that outperforms 3.1 70B. It’s less clear whether you actually inference time is faster for CoT 3B using 256x generations vs 70B if you have enough RAM. Basically a classical RAM/compute trade off

dimitry12 38 days ago [–]

From a practical standpoint, scaling test-time compute does enable datacenter-scale performance on the edge. I can not feasibly run 70B on my iphone, but I can run 3B even if takes a lot of time for it to produce a solution comparable to 70B's 0-shot.

I think it *is* an unlock.

Consider applying for YC's Spring batch! Applications are open till Feb 11.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact