Hacker News new | past | comments | ask | show | jobs | submit login

Welcome to the age of the unbenchmarkable product. You say one thing, someone says another, and here we are...



There are programming benchmarks though (data contamination issues left aside)


Yes, which is funny because there was an article on here with some pretty hard data which showed not much difference or maybe worse performance from GPT4 than 3.5-turbo.

You'll likely refute that as your mind is already made up, but there you go, another conflicting and confusing data point.


What are talking about? Just compare the output of a 3.5 vs 4 yourself for a problem you are interested in, it’s a single click in the interface.. Do you always need a study or an “expert“ to make up your mind?


Benchmarks are good. You may be a less experienced software engineer than others (or maybe more experienced?), then you will tell me “ChatGPT x is insane bro", but that's only a matter of perspective. A benchmark gives us facts, outside of our own experience, not opinions.

I'm sure ChatGPT4 would likely agree :)




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: