Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I parse a lot of CSV files and few are well-formed.

People are careless when writing CSV files because they think it is simple: just put a comma between columns, right?




I just fixed a bug at work, parsing a rather opinionated csv file of products for a web shop. It had mostly good quoting, headers for columns - and uses semicolon for field separation (so, not technically csv, but..).

Funny thing was, a lot of the product names contained an ampersand (no problem there). But one product had an html entity encoded ampersand (&). I have no idea how that semicolon escaped, eh, escaping - but that one line suddenly had most of the columns off by one...

I can see how the entity got into the db (probably errant cutnpaste) - but I wonder at the csv writer that gleefully copied the extra separator to the csv export...




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: