Haskell does this somewhat nicely by shunning the built-in String type and havin...

Haskell does this somewhat nicely by shunning the built-in String type and having ByteString and Text types. All there (and others) can be created using string literals, though that can be dangerous, but Text is a UTF-16 encoded, ICU-backed human-text monster type which handles upcasing ligatures and even more complex collation send the sort (which is, btw, how you solve the phone book issue, and it's just one C library away).

ByteString is a series if bytes that just may happen to be ok to print as human text for debugging. The system makes it hard for you to treat it otherwise by moving the "Char8-assuming" functions to different modules and packages which must be explicitly imported and carry warnings.

You convert between them using functions in the Text.Encoding module which may fail like "decodeUtf8'" and "encodeLatin1". There's also a slew of normalizing functions.

I really encourage anyone interested in this problem to peruse the Text and Text.ICU documentation.

http://hackage.haskell.org/package/text

http://hackage.haskell.org/package/text-0.11.3.1/docs/Data-T...

http://hackage.haskell.org/package/text-icu

http://hackage.haskell.org/package/text-icu-0.6.3.7/docs/Dat...