EXCEPT that the legacy Cyrillic codepages had separate codepoints for Latin a an...

zokier · 2025-05-08T14:10:54 1746713454

> The idea is that if a string is encodable in the legacy codepage, you should be able to make the roundtrip.

But the which strings are encodable in legacy codepage depends on what we define as encodable! If we had separate codepoint for "turkish small letter i" then we could have simply defined that "latin small letter i" is not encodable in legacy turkish codepage, same way that "cyrillic small letter a" is not encodable to turkish legacy codepage. "turkish small letter i" and "latin small letter i" would be just another normal homoglyph pair, same as "cyrillic small letter a" and "latin small letter a".