'length' is ... not right, and I don't want to know why it returns 4 for the above, because that's not right. If you want to provide a 'length' function for a Unicode string, you need to know what you are measuring: graphemes, codepoints, bytes? Whichever you decide to use, is inappropriate for 'length'.
To me it is rather obvious that a text-oriented language would treat any "string" as a) an atomic string and b) a sequence of the "next-lower" logical unit. I do just now realize that "An English sentence.".length could by this reasoning return 3 or 4 (3 words, one punctuation mark...).
I'm glad it's so obvious to you. Here is a quick test:
What is "ö".length?
It's one grapheme, an o with a diaeresis.
It's two codepoints, an o (0x006F) with a combining diaeresis (0x0308).
It's several bytes, depending on encoding.
How about if you reverse it first, so that the diaeresis doesn't have anything to combine with, and you have a bare letter 'o'? What's the length now? If you answered _one_ to the above, you've got a string whose length doubles when you reverse it. Is that what you want?
Too easy for you?
Let's take the Thai consonant "ก", which is a sort of a g, sort of k sound. One grapheme, one codepoint. Sorted. We'll add a vowel to it: "กอ". Two codepoints, but how many graphemes? One or two? Let's say two, but then let's point out that there is no logical difference there between that and a different vowel: "กี". This is a little more complicated? What's the length now? Is that one or two graphemes? It's clear as day that that's a single consonant + a single vowel, but how long is the string? How about: "เกียะ"? That's still a single consonant + a combining single vowel, only this time it's a compound vowel. One consonant, one vowel, how many graphemes? Are you using vertical slicing to determine what is and isn't a grapheme? Is that right?
To see this taken to its logical end by The Masters of Unicode:
http://www.unicode.org/faq/char_combmark.html - "How are characters counted when measuring the length or position of a character in a string?"
TL;DR -- I generally fall in the category of counting graphemes, as per the second FAQ you linked -- when talking about user-facing text processing. I'm don't think it makes sense to try and have one api that tries to both appease (low level) programmers and end-users.
Perhaps I wasn't entirely clear - I certainly see that there are complications. I think you're overcomplicating your examples within the domain of text - I'd say composed characters counts as one, and reversing a string with a composed character, shouldn't reverse/destroy the compositon. The reverse of "õ" isn't "o~", but simply "õ" -- and the length of "o" and "õ" should both be one -- even if they aren't coded similarly.
Now, this won't work for lower level work on "computer language" strings -- so for your unicode-library or whatever you'd have to count differently. Obviously you have to do some magic when converting a multicode-encoded string from big-endian to little-endian and vice-versa -- but that's hardly the same operation as reversing a string.
I'm not familiar with thai, but to me it looks like your "กอ" and "กี" is equivalent to the Norwegian vowel "æ" which used to be written/typset as "ae" (and can still be considered a composition in some input locals). So the length of "ae" is 2, the length of "æ" is 1 as is the length of "a". That would mess up "ae" if reversed -- but I would consider that a "special/archaic" use-case. I'm not sure if that would be similar in Thai -- I don't know for example, if typewriters and computers have been wildly used for comparable time in Norway and Thailand (I'm guessing Thailand have a few thousand more years of printing/literacy).
As mentioned in my comment above, I also find it interesting that if we're taking length to mean "number of things in a sequence", the length of a sentence would be the number of words, the length of a word would be the number of graphemes and the length of a grapheme might either be the number of bit/bytes, or there might be a level in-between of composites.
So we might have:
"This is an example.".length => 4 (or 5 or 8 depending
on how we define spaces and punctuation)
"This".length => 4
"T".length => 1 byte,7 or 8 bits, or maybe even 2 in a
prefix-based encoding (capital-transform t).
The logic would be that the full sentence is treated as a sequence of words that's treated as a sequence of graphemes that are treated as a sequence of codepoints that's treated as a stream of bits...
As the correct answer is "that depends", neither answer gets to qualify as "length", especially given that traditionally, a string is a sequence of bytes, which gives a third thing that 'length' could mean.
NO NO NO NO.
'length' is ... not right, and I don't want to know why it returns 4 for the above, because that's not right. If you want to provide a 'length' function for a Unicode string, you need to know what you are measuring: graphemes, codepoints, bytes? Whichever you decide to use, is inappropriate for 'length'.