Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
mmoskal
9 months ago
|
parent
|
context
|
favorite
| on:
Probably pay attention to tokenizers
Tokens are often sub-word, all the way down to bytes (which are implicitly understood as UTF8 but models will sometimes generate invalid UTF8...).
Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: