Hacker News new | past | comments | ask | show | jobs | submit login

It's not that weird; longer prompts require more compute. This pricing is directly proportional to the total compute required for a query, which scales with the sum of the input and output sequence lengths.



Let's take this example: https://beta.openai.com/examples/default-factual-answering

95% of the token usage in this example would consist of the prompt and I would only get 1 sentence in return. So if I wanted to generate another sentence, I would have to pay 95% of the cost towards the prompt again and again... Isn't there a way to create a template for the prompt so you only pay for the generated sentences?




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: