Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Makes me wonder if we should not have tried to make a single "unicode" and instead had distinct types for each language like EnglishString ArabicString etc. and programmers can handle each case as needed.


Unicode is structured into planes and furthermore into blocks: https://en.wikipedia.org/wiki/Plane_(Unicode)#Basic_Multilin...

So you can actually separate them if you want pretty easily


No. If that's how it worked, how many programs would have Arabic support at all? Everyone would use EnglishString for everything (because that was the example code they copied from StackOverflow) and the whole thing comes crashing down if I enter even just an umlaut.


No, I was thinking of something a little more sophisticated than that. Classes of strings would be sub-classed from others like English would have multiple sub-classes like American, British etc. Each class could have it's own parameters like whether it supports being reversed. You would not necessarily implement just for English. If you were making a palindrome finding app for example, you would only care if the class passed in supports reversing for example. Eventually you could have a language DB library, not so different that the time zone database that can be periodically updated with new rules.


That’s a lot of types, regardless of how you define them.

Is AAVE a separate “language” from “standard English”? How about Creole? Is that French, English, or a distinct language?

Is Pennsylvania Dutch/“Low German” German or English?

What about Yiddish? Modern Yiddish is an entirely different beast from historical Yiddish.

Who is going to be defining the type for Greek? What region and point in history will they be using for a reference?

Does GermanString contain Eszett? If so, is it a separate code point or interchangeable with “ss”? Is there logic that defines when “ss” should be represented as an Eszett and when it should be rendered as “ss”?

No, I think Unicode is the right approach. You get all the code points you might need and logic for processing them as glyphs. Leave everything else up to the application.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: