Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Speculation: GPT-turbo is a new chinchilla optimal model with the equivalent capabilities as GPT-3.5. So it's literally just smaller, faster and cheaper to run.

The reason I don't think it's just loss-leading is that they made it faster too. That heavily implies a smaller model.



It could be even smaller than a Chinchilla optimal model. The Chinchilla paper was about training the most capable models with the least training compute. If you are optimizing for capability and inference compute you can "over-train" by providing much more data per parameter than even Chinchilla, or you can train a larger model and then distill it to a smaller size. Increasing context size increases inference compute, but the increased capabilities of high context size might allow you to skimp on parameters and lead to a net decrease in compute. There's probably other strategies as well, but those are the ones I know of.


Ah! Interesting, I thought the capability was capped by parameters, but you're saying you can keep getting more capability from a fixed parameter size by continuing to train past what the chinchilla paper specifies. That's really cool


Not Chinchilla-optimal but Inference-optimal. Chinchilla optimality was related to the training budget and is of interest to researchers who produce mainly demos. Inference optimality includes the inference costs and is of interest in real deployments to millions of users. It is worth to pay more for training to reduce inference costs, so they probably went even further than Chinchilla.


Yeah, at this point it seems like you're just burning money if you're not rightsizing your parameters/corpus.


I think you mean GPT-4 since Chinchilla is a Deepmind project. But yes, I was also suspecting that also as it seems unlikely this was the full 175b parameter model with such big improvements in speed and price.

In fact, given the pricing for OpenAI Foundry, that seems even more likely as this GPTTurbo model was listed along with two other models with much larger context windows of 8k and 32k tokens.


"Brockman says the ChatGPT API is powered by the same AI model behind OpenAI’s wildly popular ChatGPT, dubbed “gpt-3.5-turbo.” GPT-3.5 is the most powerful text-generating model OpenAI offers today through its API suite; the “turbo” moniker refers to an optimized, more responsive version of GPT-3.5 that OpenAI’s been quietly testing for ChatGPT." [0]

Chinchilla optimization is a technique which can be applied to existing models by anyone, including OpenAI. The chatGPT API is not based on GPT-4.

[0] https://techcrunch.com/2023/03/01/openai-launches-an-api-for...


I just meant chinchilla optimal in terms of the corrected scaling curves from the chinchilla paper. The original GPT-3 was way larger than it needed to be for the amount of data they put into it based on the curves from the chinchilla paper.


It's also worth noting that we don't know any specifics (parameters, training tokens) of GPT-3.5. Only for GPT-3 those numbers have been published.


it is a smaller model. it reveals this information if you ask it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: