> Tokens can be thought of as pieces of words. Before the API processes the prompts, the input is broken down into tokens. These tokens are not cut up exactly where the words start or end - tokens can include trailing spaces and even sub-words. Here are some helpful rules of thumb for understanding tokens in terms of lengths:
> ...
> Wayne Gretzky’s quote "You miss 100% of the shots you don't take" contains 11 tokens.
It isn't going character by character, but rather token by token - both for input and for output.
This also helps explain why it has trouble with breaking a word apart (as in the case of wordle) because it doesn't "think" of glyph as 5 letters but rather two tokens that happen to be 'gly' and 'ph' with the ids of [10853, 746].
https://help.openai.com/en/articles/4936856-what-are-tokens-...
> Tokens can be thought of as pieces of words. Before the API processes the prompts, the input is broken down into tokens. These tokens are not cut up exactly where the words start or end - tokens can include trailing spaces and even sub-words. Here are some helpful rules of thumb for understanding tokens in terms of lengths:
> ...
> Wayne Gretzky’s quote "You miss 100% of the shots you don't take" contains 11 tokens.
https://platform.openai.com/tokenizer
It isn't going character by character, but rather token by token - both for input and for output.
This also helps explain why it has trouble with breaking a word apart (as in the case of wordle) because it doesn't "think" of glyph as 5 letters but rather two tokens that happen to be 'gly' and 'ph' with the ids of [10853, 746].