Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

What would be an example of 2.5 Pro failing against R1 (which is what you'd actually want to compare it to)?


R1 sometimes fails against V3 for me too, so its not a specific dig against Gemini.

In terms of code and science, Gemini is way, way too verbose in its output, and because of that it ends up getting confused by itself and hurting the quality of longer windows.

R1 does this too, but it poisons itself in the reasoning loop. You can see it during the streaming, literally criss-crossing its thoughts and thinking itself into loops before it finally arrives at an answer.

On top of that, both R1 and Gemini Pro / Flash are mediocre at anything creative. I can accept that from R1, since it's mainly meant as more of a "hard sciences" model, but Gemini is meant to be an all-purpose model.

If you pit Gemini, Deepseek R1 and Deepseek V3 against each other in a writing contest, V3 will blow both of them out of the water.


Agreed on the last point, V3 is terrifyingly good at narrative writing. And yes, R1 talks itself out of correct answers almost as often as it talks itself into them.

But in general 2.5 Pro is an extremely strong model. It may lose out in some respects to o3-pro, but o3-pro is so much slower that its utility tends to be limited by my own attention span. I don't think either would have much to fear from V3, though, except possibly in the area of short fiction composition.




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: