$0.12 per 1k completion tokens is high enough that it makes it prohibitively exp...

weird-eye-issue · on March 14, 2023

I disagree that out of all possible use cases for a large context model that a chatbot is really the "best use case".

LeanderK · on March 14, 2023

> $0.12 per 1k completion tokens is high enough that it makes it prohibitively expensive to use the 32k context model.

this is a lot. I bet there's a quite a bit of profit in there

csa · on March 14, 2023

> I bet there's a quite a bit of profit in there

Is this profit-seeking pricing or pricing that is meant to induce folks self-selecting out?

Genuine question — I don’t know enough about this area of pricing to have any idea.

RosanaAnaDana · on March 14, 2023

Gotta pay back M$

ml_basics · on March 14, 2023

> Especially in a chatbot use case with cumulative prompting, which is the best use case for such a large context vs. the default cheaper 8k window.

Depends on what is up with the images and how they translate into tokens. I really have no idea, but could be that 32k tokens (lots of text) translates to only a few images for few-shot prompting.

The paper seems not to mention image tokenization, but I guess it should be possible to infer something about token rate when actually using the API and looking at how one is charged.

minimaxir · on March 14, 2023

Currently, CLIP's largest size is at patch-14 for 336x336 images, which translates to 577 ViT tokens [(336/14)^2+1]. It might end up being token-efficient depending on how it's implemented. (the paper doesn't elaborate)

sebzim4500 · on March 14, 2023

I would imagine most usecases for the 32k model have much longer prompts than completions, so the $0.06 per prompt token will be the real problem. I can't think of a usecase yet, but that might be because I haven't got a sense of how smart it is.

gremlinsinc · on March 14, 2023

can't you combine instances of 4k tokens in 3.5 to fake it? having one gpt context per code file, for instance and maybe some sort of index?

I'm not super versed on lang chain but that might be kinda what that solves...

minimaxir · on March 14, 2023

LangChain/context prompting can theoetically allow compression of longer conversation, which will likely be the best business strategy.