Hacker News new | past | comments | ask | show | jobs | submit login

It's interesting: perhaps the stability (from a change management perspective) of the tokenization algorithm, being able to hold that constant, between old and new training runs was deemed more important than trying to clean up the data at an earlier phase of the pipeline. And the eventuality of glitch tokens was deemed an acceptable consequence.



Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: