Should have just gone with 32 bit characters and no combinations. Utter simplici...

guappa · 2025-08-22T10:30:27 1755858627

That would be extremely wasteful, every single text file would be 4x larger and I'm sure eventually it would not be enough anyway.

Ekaros · 2025-08-22T10:44:40 1755859480

Maybe we should have just replaced ascii, horrible encoding were entire 25% of it is wasted. And maybe we could have gotten a bit more efficiency by saying instead of having both lower and uppercase letters just have one and then have a modifier before it. Saving lot of space as most text could just be lowercase.

guappa · 2025-08-22T11:50:33 1755863433

yeah that's how ascii works… there's 1 bit for lower/upper case.

bawolff · 2025-08-22T09:37:20 1755855440

I think combining characters are a lot simpler than having every single combination ever.

Especially when you start getting into non latin-based languages.

amake · 2025-08-22T09:14:47 1755854087

What does "no combinations" mean?

Ekaros · 2025-08-22T10:24:50 1755858290

Like say Ä it might be either Ä a single byte, or combination of ¨ and A. Both are now supported, but if you can have more than two such things going in one thing it makes a mess.

amake · 2025-08-22T11:26:59 1755862019

That's fundamental to the mission of Unicode because Unicode is meant to be compatible with all legacy character sets, and those character sets already included combining characters.

So "no combinations" was never going to happen.

int_19h · 2025-08-22T23:18:53 1755904733

That quickly explodes if you need more than one diacritic per letter (e.g. Vietnamese often has two, and then there's https://en.wikipedia.org/wiki/International_Phonetic_Alphabe...).