Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

...it's a bit of a shame that the same upper/lowercase trick doesn't apply to all UNICODE codepoints (at least those that have upper/lower variants).

It seems to work for codepoints up to U+00FF, for instance:

    - Å (U+00C5) vs å (U+00E5)
...but above 0xFF lowercase follows uppercase:

    - Ă (U+0102) vs ă (U+0103)
Typical for UNICODE though, nothing makes sense ;)


That's because U+00A0–U+00FF are encoding an earlier character set: "ISO Latin-1" (ISO 8859-1), itself based on DEC's "Multinational Character Set". The upper/lowercase trick does not apply to ß/ÿ but does in MCS where Ÿ/ÿ are at a different pair of code points.

ISO Latin-1 was the character set on many Unix systems, Amiga OS, MS-Windows (as "Windows-1252" with extra chars), and was for many years the default character set on the web.


In Unicode, there's no universal 1:1 mapping between cases.

Lowercase "I" could very well be "ı" (lowercase dotless i) if you're typing Turkish.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: