Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I mean, either they cheated on evals ala Llama4, or they have a paradigm that's currently best in class in at least a few standard evals. Both alternatives are possible, I suppose.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: