Interesting tidbit at the very end that's worth noting for anyone using the API ...

scrollop · on Aug 6, 2024

From what I've learned from OpenAI, the "latest" "cheaper" model will perform worse than the previous model on various tasks (esp reasoning).

ralusek · on Aug 6, 2024

I don't think it's been well enough acknowledged that all of the shortcuts LLMs have been taking with ways of attempting to compress/refine/index the attention mechanism seem to result in dumber models.

GPT 4 Turbo was more like GPT 3.9, and GPT 4o is more like GPT 3.7.

alach11 · on Aug 7, 2024

Do you have benchmarks demonstrating this? In my own personal/team benchmarks, I've seen 4o consistently outperform the original gpt-4.

maeil · on Aug 7, 2024

I'm building a product that requires complex LLM flows and out of OpenAI's "cheap" tier models, the old versions of Turbo-3.5 are far better than the last versions of it and 4o-mini. I have a number of tasks that the former consistently succeed at and the latter consistently fail at regardless of prompting.

Leaderboards and benchmarks are very misleading as OpenAI is optimizing for them, like in the past when certain CPU manufacturers would optimize for synthetic benchmarks.

Fwif these aren't chat usecases, for which the newer models may well be better.

scrollop · on Aug 6, 2024

Some commenters acknowledge it - and quantify it:

https://www.youtube.com/watch?v=Tf1nooXtUHE&t=689s

Der_Einzige · on Aug 6, 2024

They try to gaslight us and tell us this isn't true because of benchmarks, as though anyone has done anything but do the latent space exploration equivalent of throwing darts at the ocean from space.

It's taken years to get even preliminary reliable decision boundary examples from LLMs because doing so is expensive.

campers · on Aug 7, 2024

Thats ok from the perspective of it making room for a more capable and expensive GPT5 model to compete with Opus 3.5 when that arrives this year. The significant price drops for a small loss in quality is a reasonable tradeoff. Then GTP4o becomes the mid tier and GTP4o-mini the low tier.

There was 100 days in between Claude 3.0 Opus and Claude 3.5 Sonnet being released which gave us similar capability at a 80% price reduction. When I was using Opus I was thinking this is nice, but the cost does add up. Having Sonnet 3.5 so soon after was a nice surprise.

One more round of 80% price cuts after that combined with building out the multi-step agentic workflows should provide some decent capabilities!

samstave · on Aug 6, 2024

Am I the only one that wants to know 1,000% *WHY* such things?

Is it a natural function of how models evolve?

Is it engineered as such? Why? Marketing/money/resources/what?

WHO makes these decisions and why?

---

I have been building a thing with Claude 3.5 pro account and its *utter fn garbage* of an experience.

It lies, hallucinates, malevolently changes code that was already told was correct, removes features - explicitly ignore project files. Has no search, no line items, so much screen real-estate is consumed with useless empty space. It ignores states style guides. get CAUGHT forgetting about a premise we were actively working on them condescendingly apologies "oh you're correct - I should have been using XYZ knowledge"

It makes things FN harder to learn.

If I had any claude engineers sitting in the room watching what a POS service it is from a project continuity point...

Its evil. It actively f's up things.

One should have the ability to CHARGE the model token credit when it Fs up so bad.

NO FN SEARCH??? And when asked for line nums in it output - its in txt...

Seriously, I practically want not just a refund, I want claude to pay me for my time correcting its mistakes.

ChatGPT does the same thing. It forgets things committed to memory - refactors successful things back out of files. ETc....

Its been a really eye opening and frustrating experience and my squinty looks are aiming that its specifically intentional:

They dont want people using a $20/month AI plan to actually be able to do any meaningful work and build a product.

campers · on Aug 7, 2024

It is difficult to get the AI models to get everything right every time. I noticed too that it would sometimes remove comments etc when re-writing code.

The way to get better results is with agentic workflows that breakdown the task into smaller steps that the models can iteratively come to a correct result. One important step I added to mine is a review step (in the reviewChanges.ts file) in my workflow at https://github.com/TrafficGuard/nous/blob/main/src/swe/codeE...

This gets the diff and asks questions like:

- Are there any redundant changes in the diff? - Was any code removed in the changes which should not have been? - Review the style of the code changes in the diff carefully against the original code.

Maybe try using that, or the package that I use which does the actual code edits called Aider https://aider.chat/

scrollop · on Aug 6, 2024

Use an API from the top models with a good frontend, then, and use precise instructions.

It's odd, as many people praise claude's coding capabilities.

scrollop · on Aug 6, 2024

Also, is it a coincidence that at cheaper (potentially faster?) model has been released (just) before they roll out the "new" voice mode (which boasts very low latency)?

codingwagie · on Aug 6, 2024

Its usually a distilled smaller model

minimaxir · on Aug 6, 2024

The new price is also now reflected on the pricing page: https://openai.com/api/pricing/

It's weird that's only a footnote when it's actually a major shift.

sjnair96 · on Aug 6, 2024

I also looked up the same. I wonder why. They must have a subsequent announcement regarding this I'd expect.

ComputerGuru · on Aug 6, 2024

If you use the undecorated gpt-4o do you automatically get the latest?

OutOfHere · on Aug 6, 2024

For the record, you should never use that in an application. Always explicitly note the full versioned model name. This will prevent bad surprises because not every new version is an improvement; sometimes they get worse, especially at specific tasks.

tedsanders · on Aug 6, 2024

We'll update gpt-4o in 3 weeks. (We've always updated it couple weeks after launch, so no one is immediately surprised by a new model drop.)

voiper1 · on Aug 6, 2024

>We will give a 3-week notice before updating gpt-4o to point to the new snapshot gpt-4o-2024-08-06.

Source: https://platform.openai.com/docs/models/gpt-4o

daguava · on Aug 6, 2024

The un-postfixed version will point to the older model for the next 3 weeks their docs say