Hacker News new | past | comments | ask | show | jobs | submit login

That’s roughly how UTF-8 works, with some tweaks to make it self-synchronizing. (That is, you can jump to the middle of a stream and find the next code point by looking at no more than 4 bytes.)

As to running out of code points, we’re limited by UTF-16 (up to U+10FFFF). Both UTF-32 and UTF-8 unchanged could go up to 32 bits.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: