Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

How do you measure code generation accuracy? Are there some base tests and if so how can I ensure the models aren't tuned for those tests only the same way vw cheated the emissions tests on their diesels?


We run a set of change requests on the discourse repo. Good point, we plan to publish more detailed testing benchmarks and metrics on the website.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: