Yes, which is funny because there was an article on here with some pretty hard data which showed not much difference or maybe worse performance from GPT4 than 3.5-turbo.
You'll likely refute that as your mind is already made up, but there you go, another conflicting and confusing data point.
What are talking about? Just compare the output of a 3.5 vs 4 yourself for a problem you are interested in, it’s a single click in the interface.. Do you always need a study or an “expert“ to make up your mind?
Benchmarks are good. You may be a less experienced software engineer than others (or maybe more experienced?), then you will tell me “ChatGPT x is insane bro", but that's only a matter of perspective. A benchmark gives us facts, outside of our own experience, not opinions.