Codepoints and characters are not equivalent. A character can consist of one or more codepoints. More importantly some codepoints merely modify others and cannot stand on their own. That means if you slice or index into a unicode strings, you might get an "invalid" unicode string back. That is a unicode string that cannot be encoded or rendered in any meaningful way.
Right, ok. I recall something about this - ü can be represented either by a single code point or by the letter 'u' preceded by the modifier.
As the user of unicode I don't really care about that. If I slice characters I expect a slice of characters. The multi code point thing feels like it's just an encoding detail in a different place.
I guess you need some operations to get to those details if you need. Man, what was the drive behind adding that extra complexity to life?!
Thanks for explaining. That was the piece I was missing.