Hacker News new | past | comments | ask | show | jobs | submit login

Why should it crash? The proper procedure when validating a UTF-8 string is to replace errors with U+FFFD.

The term character has many meanings. Graphemes are characters and that's what most users expect, something that's displayed as a single graphical unit.




I use "character" in the same way that the Unicode Consortium uses the word. Though "code point" would be more precise.


That's what they were hoping for. Didn't tuen out thst way. From icu-project.org:

"As with glyphs, there is no one-to-one relationship between characters and code points. What an end-user thinks of as a single character (grapheme) may in fact be represented by multiple code points; conversely, a single code point may correspond to multiple characters."




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: