Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

It's not guaranteed for 7-bit ASCII either because tolower/toupper are locale-dependent and with the tr_TR lowercase I (U+0049) is ı (U+0131, aka dotless i) wich encodes as two bytes in UTF-8.


That's not ascii then. It's byte width compatible (to a certain degree as you point out). But it's not ascii. ascii defines 128 code points and the handling of an escape character. It doesn't handle locales.


ASCII is an encoding, it doesn't say anything about locale. The point is that tolower/toupper is not guaranteed to be safe even if the input is 7-bit.


I don't think there is any possibility of doing locale specific lower/upper casing in ASCII. It is really designed for (a subset of) American english.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: