Hacker News new | past | comments | ask | show | jobs | submit login

Yep. In javascript (and Java and C# from memory) the String.length property is based on the encoding length in UTF16. It’s essentially useless. I don’t know if I’ve ever seen a valid use for the javascript String.length field in a program which handles Unicode correctly.

There’s 3 valid (and useful) ways to measure a string depending on context:

- Number of Unicode characters (useful in collaborative editing)

- Byte length when encoded (these days usually in utf8)

- and the number of rendered grapheme clusters

All of these measures are identical in ASCII text - which is an endless source of bugs.

Sadly these languages give you a deceptively useless .length property and make you go fishing when you want to make your code correct.




> and the number of rendered grapheme clusters

This is also rarely useful unless you are working with a monospace font where all grapheme clusters have the same width, which is probably none if you support double-width characters. More likely what you are interested in is the display length with a particular font or column count with a monospace font.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: