Which is why we have languages like Go where we can put those types of developer...

imron · on June 20, 2016

I think you have a gripe with my argument because you may be missing my point

I get your point, it just doesn't apply to many real world situations I've seen where you don't have the luxury of just using a higher level language or a library that takes care of all these things, or keeping programmers who don't understand what they are doing away from that sort of thing.

The most egregious example that I've personally seen was a developer working on a legacy Cobol banking program that needed Chinese support retro-fitted to it.

The app was originally only developed with ASCII in mind and so sliced through strings willy-nilly, which naturally caused problems with Chinese text.

The developer working on the "fix" before me, was calling out to ICU through the C API of the version of Cobol that we used and was still messing things up - he'd actually modified ICU in some custom way to prevent the bug from crashing the program, but was still causing corrupted text.

I basically undid all his changes, and wrapped all COBOL string splicing to call a function that always split a string at a valid position - truncating invalid bytes at the start/end as necessary. Much simpler and resulted in the removal of an unnecessary dependency on ICU.

This bug had been outstanding for several months when I first joined that company, and it was the first one I was assigned to work on - and luckily for them they'd accidentally hired someone who had done lots of multilingual programming before.

it's very easy for it to prevent the the programmer from slicing in the middle of a code unit.

Okay, but even you made a mistake in your first example of what to do, and that's the sort of code that someone who knows what they are doing could write, and will seem to work in the conditions under which it was tested (working on my machine, ship it!), but that will cause seemingly random problems once it hits users.

Avernar · on June 20, 2016

> I get your point, it just doesn't apply to many real world situations I've seen where you don't have the luxury of just using a higher level language or a library that takes care of all these things

No, I still think your missing some of it. I am not advocating that what I said is the solution for everything.

Someone said that slicing UTF-8 strings leads to string corruption and endorsed the Python 3 frankenstien unicode type as a way to avoid it. I just gave a way of preventing that.

Now you argued that a novice programmer would fail to implement it properly. So you're comparing my method implemented by a novice programmer to a method implemented by profesional compiler writers. That hardly seems fair. :)

So my argument is that if my method were to be implemented by professional compiler writers it would prevent corrupted strings while still using UTF-8 as the internal representation.

> I basically undid all his changes, and wrapped all COBOL string splicing to call a function that always split a string at a valid position - truncating invalid bytes at the start/end as necessary.

> luckily for them they'd accidentally hired someone who had done lots of multilingual programming before.

So an expert programmer implemented a string splitting function that didn't corrupt strings. :D

> but even you made a mistake in your first example of what to do

I writing this on an iPad while watching TV and playing a game on another android tablet while looking at the wikipedia UTF-8 article on a tiny phone screen while a little white dog is trying to bite my fingers (wish I was making this up). Not exactly my usual programming environment. ;)

imron · on June 20, 2016

> Now you argued that a novice programmer

sigh if only it was novice programmers making these mistakes :-/

Avernar · on June 20, 2016

Upvote for that comment.

The stuff I've seen in some people's multithreaded code just makes me want to cry.