Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

EXCEPT that the legacy Cyrillic codepages had separate codepoints for Latin a and Cyrillic а. You’re also making assumptions about the roundtrip preservation that are invalid. The idea is that if a string is encodable in the legacy codepage, you should be able to make the roundtrip. Yes, you can’t roundtrip ⨋ to most legacy codepages, but that’s not the brief.


> The idea is that if a string is encodable in the legacy codepage, you should be able to make the roundtrip.

But the which strings are encodable in legacy codepage depends on what we define as encodable! If we had separate codepoint for "turkish small letter i" then we could have simply defined that "latin small letter i" is not encodable in legacy turkish codepage, same way that "cyrillic small letter a" is not encodable to turkish legacy codepage. "turkish small letter i" and "latin small letter i" would be just another normal homoglyph pair, same as "cyrillic small letter a" and "latin small letter a".




Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: