Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> not all Chinese characters in use for names are representable in unicode

Why? How do you come to this conclusion?



Han unification[1] prevents the representation of all Chinese characters. There are multiple languages that use Chinese characters, but they don't all use the same characters. Unicode decided to only use Han Chinese characters, so names using other sorts of Chinese characters can't be written with Unicode. The Han "equivalent" characters can be used, but that looks weird.

Think of it as though Unicode decided that the letter "m" wasn't needed to write English text, since you can just write "rn" and it'll be close enough. Someone named "James" might want to have their name spelled correctly instead of "Jarnes", but that wouldn't be possible. Han unification did essentially this.

[1] https://en.wikipedia.org/wiki/Han_unification


I feel it's unlikely that this the explanation for what GGP had in mind. I postulate that names characters usually have no variants, thus do not undergo unification, or where there are variants, they are already encoded as Z variants, so the contention is also moot.

Prove me wrong with a counter-example.



𫟈 is U+2B7C8 "CJK Unified Ideo­graph- 2B7C8". 𛁻 is U+1B07B "Hentaigana Letter To-5".

Both character fall into the first category I mentioned, no variants.




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: