Yep. In javascript (and Java and C# from memory) the String.length property is based on the encoding length in UTF16. It’s essentially useless. I don’t know if I’ve ever seen a valid use for the javascript String.length field in a program which handles Unicode correctly.
There’s 3 valid (and useful) ways to measure a string depending on context:
- Number of Unicode characters (useful in collaborative editing)
- Byte length when encoded (these days usually in utf8)
- and the number of rendered grapheme clusters
All of these measures are identical in ASCII text - which is an endless source of bugs.
Sadly these languages give you a deceptively useless .length property and make you go fishing when you want to make your code correct.
This is also rarely useful unless you are working with a monospace font where all grapheme clusters have the same width, which is probably none if you support double-width characters. More likely what you are interested in is the display length with a particular font or column count with a monospace font.
There’s 3 valid (and useful) ways to measure a string depending on context:
- Number of Unicode characters (useful in collaborative editing)
- Byte length when encoded (these days usually in utf8)
- and the number of rendered grapheme clusters
All of these measures are identical in ASCII text - which is an endless source of bugs.
Sadly these languages give you a deceptively useless .length property and make you go fishing when you want to make your code correct.