Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

No it isn't. Just gave some context to it (purpose) and copied into it some 250 lines of code I know has some bugs someone looking more or less closely would find and asked to evaluate its correctness. It did not find any of the problems and reported 5 supposed problems that don't exist.


4.5 is not trained on code. And it shows. It is however to my eyes more fluid, thoughtful, has better theory of mind. It's like someone scaled up GPT-4, and I really like it.


You don't know how smart OP is


I give a similar task when I interview SWE candidates: about half cannot find any bugs (and sometimes see bugs where there are none), despite years of claimed experience in the language/domain.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: