Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Did you try entering one of those scrambled sentences into the tokenizer? It's not tokenizing words exactly- the doc says roughly ~100 tokens => 75 words.


I just tried out of curiosity. The jumbled words get broken up into different tokens, the unjumbled words do not get split.

https://imgur.com/a/a3zmkIv




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: