I find that o1 and Sonnet 3.5 are good and bad quite equally on different things...

anonzzzies · 2024-12-22T13:42:07 1734874927

We do the same (all requests go to o1, sonnet and gemini and we store the results for later to compare) automatically for our research: Claude always wins. Even with specific prompting on both platforms. Especially frontend it seems o1 really is terrible.

rubymamis · 2024-12-22T15:22:53 1734880973

Every time I try Gemini, it's really subpar. I found that qwen2.5-coder-32b-instruct can be better.

Also, for me 50% 50% for Sonnet and o1, but although I'm not 100% sure about it, I think o1 is better with longer and more complicated (C++) code and debugging. At least from my brief testing. Also, OpenAI models seem to be more verbose - sometimes it's better - where I'd like additional explanation on chosen fields in a SQL schema, sometimes it's too much.

EDIT: Just asked both o1 and Sonnet 3.5 the same QML coding question, and Sonnet 3.5 succeeded, o1 failed.

oceanplexian · 2024-12-22T22:52:10 1734907930

Very anecdotal but I’ve found that for things that are well spec’d out with a good prompt Sonnet 3.5 is far better. For problems where I might have introduced a subtle logical error o1 seems to catch it extremely well. So better reasoning might be occurring but reasoning is only a small part of what we would consider intelligence.

energy123 · 2024-12-22T23:44:51 1734911091

A new o1 was released on December 17th. Which one are you talking about

tigershark · 2024-12-23T01:44:15 1734918255

Exactly. The previous version of o1 did actually worse in the coding benchmarks, so I would expect it to be worse in real life scenarios. The new version released a few days ago on the other hand is better in the benchmarks, so it would seem strange that someone used it and is saying that it’s worse than Claude.

CapcomGo · 2024-12-22T16:48:52 1734886132

Wins? What does this mean? Do you have any results? I see the claims that Claude is better for coding a lot but using it and using Gemini 2.0 flash and o1 and it sure doesn't seem like it.

ynniv · 2024-12-22T14:32:29 1734877949

Claude is trained on principles. GPT is trained on billions of edge cases. Which student do you prefer?