> How to say you don't know what Unicode is for without saying it.
I know what its original mission was, which was a character set.
It's been mangled beyond recognition - by including semantic information which is in the purview of context, and presentation information (italics, fonts) which is in the purview of markup languages and layout information (backwards text) which is also in the purview of markup.
> you can't expect programmers to understand pesky little thing like languages having different writing,
But you're requiring programmers to understand all the complicated normalization rules? Normalization is a totally unnecessary feature. Just use the normalized code points. Done.
> these print the same but are semantically different!
Think about what this means. How ever did people manage to read and understand printed books? The semantic meaning comes from the context, not the glyph. For example, I can use `a` to mean the `ahh` sound, or the `ayy` sound, or mean a variable in algebra. How can I know which? The context.
It is totally impossible to add every meaning a glyph has.
> would lose semantic information
Unicode is supposed to be a character set. That's it. Characters do not have semantic information without context.
Oh, and here's some drawkcab text I wrote without any help from Unicode at all.
I had to add some code into the D compiler to reject Unicode text direction "characters" because they can be used to invisibly insert malware into ordinary code.
Adding toy "languages" should be for people having fun, not Unicode.
I know what its original mission was, which was a character set.
It's been mangled beyond recognition - by including semantic information which is in the purview of context, and presentation information (italics, fonts) which is in the purview of markup languages and layout information (backwards text) which is also in the purview of markup.
> you can't expect programmers to understand pesky little thing like languages having different writing,
But you're requiring programmers to understand all the complicated normalization rules? Normalization is a totally unnecessary feature. Just use the normalized code points. Done.
> these print the same but are semantically different!
Think about what this means. How ever did people manage to read and understand printed books? The semantic meaning comes from the context, not the glyph. For example, I can use `a` to mean the `ahh` sound, or the `ayy` sound, or mean a variable in algebra. How can I know which? The context.
It is totally impossible to add every meaning a glyph has.
> would lose semantic information
Unicode is supposed to be a character set. That's it. Characters do not have semantic information without context.
Oh, and here's some drawkcab text I wrote without any help from Unicode at all.
I had to add some code into the D compiler to reject Unicode text direction "characters" because they can be used to invisibly insert malware into ordinary code.
Adding toy "languages" should be for people having fun, not Unicode.