Hacker News new | past | comments | ask | show | jobs | submit login

There are data sources that have a clear unambiguous specification of what information is in each field, and exactly how they are formatted, such that you could auto-parse the column. And even have no data entry errors, so that you can rely on that parsing and have clean error free imported data.

There are many more that don't have a clear spec, or even if they do, have every possible variation of data corruption / keying errors / user misunderstanding / total 'don't give a shit, I'll enter it how I want, that's what the computer is supposed to handle' problems in that source data, that make parsing an absolute nightmare.

I've had to manually review a list of 10k+ data readings monthly, from an automated recorder, because the guy who was supposed to copy the collected data files as is and upload them, instead opened each one and fixed what he (badly mistakenly) thought had been recorded wrong. Different changes in a dozen different ways based on no particular logic beyond "that doesn't look right". And un-fireable, of course.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: