The wchar_t thing is made much worse by disagreements on what type that actually...

benchloftbrunch · 2024-09-09T23:06:36 1725923196

WTF-8 barely qualifies as "another encoding system" - it's a trivial superset of UTF-8 that omits the rule forbidding surrogate codes.

Imo that artificial restriction in UTF-8 is the problem.

zzo38computer · 2024-09-11T05:53:50 1726034030

I think the problem is believing that one character set or character encoding is suitable for everything, and that it has one definition. Neither is true.

Sometimes the restriction is appropriate, but sometimes a variant without this restriction is appropriate, and sometimes Unicode is not appropriate at all. The "artificial restriction" in UTF-8 is legitimate (since they are not valid Unicode characters) but should not apply for all kinds of uses; the problem is programs that apply them when they shouldn't be applied because of limitations in the design.

I think that using a sequence of bytes as the file name and passwords is better, and that file names and passwords being case sensitive is also better.

However, I think "WTF-8" specifically means that mismatched surrogates can be encoded, in case you want to convert to/from invalid UTF-16. Sometimes you might use a different variant of UTF-8, that can go beyond the Unicode range, or encode null characters without null bytes, etc. Sometimes it is better to use different Unicode encodings, or different non-Unicode encodings (which cannot necessarily be converted to Unicode; don't assume that you can or should convert them), or to care only that it is ASCII (or any extension of ASCII without caring about specific extension it is), or to not care about character encoding at all.

progmetaldev · 2024-09-09T22:20:39 1725920439

Is this just a really good joke, or something real? I enjoyed it, regardless!

Dwedit · 2024-09-09T22:35:54 1725921354

It was previously discussed at https://news.ycombinator.com/item?id=9611710