Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Yes, that’s very intentional and just masking (or setting) the bit is the intended way to do case-insensitive comparison of the letter range in ASCII (eg. stricmp in C), or to transform text to lower or upper case (tolower, toupper).

But what’s more, ever wondered whence the control (Ctrl) key presses like Ctrl-H to backspace, or Ctrl-M for carriage return? Well, inspecting the ASCII chart it becomes evident: the Ctrl key simply masks bit 6 (0x40), turning a letter into its respective control character!



...it's a bit of a shame that the same upper/lowercase trick doesn't apply to all UNICODE codepoints (at least those that have upper/lower variants).

It seems to work for codepoints up to U+00FF, for instance:

    - Å (U+00C5) vs å (U+00E5)
...but above 0xFF lowercase follows uppercase:

    - Ă (U+0102) vs ă (U+0103)
Typical for UNICODE though, nothing makes sense ;)


That's because U+00A0–U+00FF are encoding an earlier character set: "ISO Latin-1" (ISO 8859-1), itself based on DEC's "Multinational Character Set". The upper/lowercase trick does not apply to ß/ÿ but does in MCS where Ÿ/ÿ are at a different pair of code points.

ISO Latin-1 was the character set on many Unix systems, Amiga OS, MS-Windows (as "Windows-1252" with extra chars), and was for many years the default character set on the web.


In Unicode, there's no universal 1:1 mapping between cases.

Lowercase "I" could very well be "ı" (lowercase dotless i) if you're typing Turkish.


Nice!

I'm an emacs user, and when I use a readline-based REPL I use ctrl-M a lot. I thought it was inherited from the emacs keybindings, like many other shortcuts from GNU readline


Then an additional useful command: In the out-of-the-box emacs bindings, C-q is the "quoted insert" command. It will take the next character and directly insert it into the buffer. This is useful for things like tab or control characters where emacs would normally use the keystroke to do something else. I've been working in an email-related space lately so I've been doing a good amount of C-q C-m for inserting literal CRs, and C-q TAB for a few places where I want a literal tab in the source, in a buffer that interprets a normal TAB as a command to indentify the current row. I mention this because you can use the ASCII table to work out how to insert a particular control character with your keyboard literally, if you need to insert one of the handful of other characters you may be interested in every so often, like C-l for "form feed" (now used for "page feed" in some older printer-related contexts) or C-@ for NUL if you're doing something weird with binary files in a "text" buffer.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: