>NOTE: You can always find a character boundary from an arbitrary point in a stream of octets by moving left an octet each time the current octet starts with the bit prefix 10 which indicates a tail octet. At most you'll have to move left 3 octets to find the nearest header octet.
This is incorrect. You can only find boundaries between code points this way.
Until your you learn that not all "user perceived characters" (grapheme clusters) can be expressed as single code point Unicode seems cool. These UTF-8 explanations explain the encoding but leave out this unfortunate detail. Author might not even know this because they deal with subset of Unicode in their life.
If you want to split text between two user perceived characters, not between them, this tutorial does not help.
Unicode encodings are is great if you want to handle subset of languages and characters, if you want to be complete, it's a mess.
You're right, that should read "codepoint boundary" not "character boundary". I can fix that.
I do briefly mention grapheme clusters near the end, didn't want to introduce them as this article was more about the encoding mechanism itself. Maybe a future article after more research :)
This is incorrect. You can only find boundaries between code points this way.
Until your you learn that not all "user perceived characters" (grapheme clusters) can be expressed as single code point Unicode seems cool. These UTF-8 explanations explain the encoding but leave out this unfortunate detail. Author might not even know this because they deal with subset of Unicode in their life.
If you want to split text between two user perceived characters, not between them, this tutorial does not help.
Unicode encodings are is great if you want to handle subset of languages and characters, if you want to be complete, it's a mess.