Tokens are approximately 3/4ths of a word. That might sound odd if you're unfamiliar, but in short it has to do with how LLMs break down words to process them. Letters are too granular for sufficient quality, full words don't allow LLMs to produce novel text like "dkjmcf0248375cyu18c7437tr18c237m yt034c8nwey4mtc0r23q9p,a;" (should it need to), so the industry has settled on the middle ground of a part of a word.