Kurdish Parentheses on OpenStreetMap, Three Ways

globular-toast · on Nov 8, 2023

Huh... I never considered that parentheses go the other way around in RTL languages...

How do you think the user made the mistake in the first place? It seems like it was edited to "look right" when it was rendered wrong?

chrismorgan · on Nov 9, 2023

When you say “go the other way around”, it’s important to note that you still use U+0028 "(" LEFT PARENTHESIS as the opening delimiter and U+0029 ")" RIGHT PARENTHESIS as the closing one, despite their names; but when the bidirectional algorithm decides they should be rendered right-to-left, they get replaced with the opposite glyphs, according to the Unicode Bidi_Mirroring_Glyph property, and so LEFT PARENTHESIS is rendered as a right parenthesis.

See https://unicode.org/reports/tr9/#Mirroring for description of the process, and https://www.unicode.org/Public/UCD/latest/ucd/BidiMirroring.... for the full list of mirrorings, including mentioning cases where there is no appropriate mirrored character defined by Unicode (like √—in fact, I think they’re all mathematical notation), and so they suggest the renderer horizontally flip the glyph instead (which isn’t perfect, as things like italic slant will be inverted).

sp332 · on Nov 9, 2023

(()) isn't a palindrome, but ())( is.

taneliv · on Nov 9, 2023

Try for yourself! It might be very instructive, even if you can't reproduce this exact same issue. Install an RTL locale and keyboard setting (Arabic is more funky with the cursive script where letters change form after you input the _next_ letter, Hebrew is not, so it might be easier to recognize some issues). Either way, you don't need to know the language, just bang on the keyboard to produce some text. Or copy from somewhere.

Now try to: enter numbers in the text (they are LTR at least in those two RTL locales). Try to select a passage with both text and numbers. Try to copy and paste words from, say, Arabic Wikipedia in the middle of English text, and vice versa. Enter parenthesis, especially near newlines. I don't know about you, but my intuition about where text will get inserted is sometimes quite wrong.

Don't be alarmed if things work differently in different applications. Or perhaps, be. I still haven't figured out how to input RTL and LTR text properly in Slack using Firefox (I don't know which party to blame, even: me, browser, or Slack).

schoen · on Nov 9, 2023

The character rendering is much less context-dependent than Arabic, but Hebrew does have a few final forms which are different at the end of a word, so one can't strictly say that Hebrew characters will never change as a result of adding additional text. Just not nearly as much as Arabic characters do!

Actually, I don't remember clearly how the different forms are handled by different character sets and if one would have to type them explicitly in some configurations. Surely on typewriters, but maybe not in Unicode?

taneliv · on Nov 9, 2023

True, I forgot those final forms in Hebrew. Perhaps my mind is conflating them with the initial capital letters of sentences and names in most text with Latin letters.

On the different forms, https://en.wikipedia.org/wiki/Arabic_script_in_Unicode has a short summary. It is possible to type those contextual forms explicitly (they have their own code points), but "presentation forms are present only for compatibility with older standards, and are not currently needed for coding text."

gniv · on Nov 9, 2023

Meta: Some links or more details about the location would have been nice, so that I can check it for myself.

Edit: Found it: https://www.openstreetmap.org/way/195240072

Alifatisk · on Nov 9, 2023

This is so interesting, can this happen in other RTL languages too?

Flimm · on Nov 9, 2023

jihadjihad · on Nov 8, 2023

It's OT, but it reminds me of something I saw on HN a while back about how Costa Rica handles addresses [0].

0: https://news.ycombinator.com/item?id=34999366

SeriousM · on Nov 9, 2023

Oh wonderful HN community. Yet another hour spent learning something new! This time that my imagination of "200 chars are enough to store a street name" and "everyone has a zip code" is just plain wrong for a lot of places.

tgma · on Nov 8, 2023

This is nothing unique to maps. It has to do with Unicode bidi algorithm mixing RTL and LTR text in various contexts. Note that in RTL languages you have things like numbers that are rendered in LTR and mixing them introduces complexity.

https://unicode.org/reports/tr9/

mapmeld · on Nov 8, 2023

I did a write-up because it was an example of broken text "in the wild" rather than in technical references. Also it takes me a minute to think of () as open/close parens instead of left/right parens.

tgma · on Nov 8, 2023

Yeah. Nothing against your article. As someone whose native language is RTL I can tell you this is not an isolated case at all. If you're used to operating a computer in RTL languages like Arabic script you'd see this stuff every single day. Not exaggerating at all. ;)

Unicode is such a complex beast this is just one aspect of it.

dotancohen · on Nov 8, 2023

You might be interested in a page I wrote about mixed RTL and LTR text and getting them to render properly: https://dotancohen.com/howto/rtl_right_to_left.html

I suspect that your user entered the street name correctly at first, then played around with the wrong parenthesis in the wrong locations until he settled upon something that looked like it visually rendered as he expects, on his system in the input box.

jprd · on Nov 8, 2023

I don't know why, but when I first saw this headline I thought it was going to be something like a "Double Irish with a Dutch Sandwich".

The actual topic is much more interesting :D