Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

CSV is a superb, incredibly useful data format.. but not perfect or complete.

Instead of breaking CSV by adding to it .. I recommend augmenting it :

It would be useful to have a good standardized / canonical json format for things like encoding, delimiter, schema and metadata, to accompany a zipped csv file, perhaps packaged in the same archive.

Gradually datasets would become more self-documenting and machine-usable without wrangling.



> It would be useful to have a good standardized / canonical json format for things like encoding, delimiter, schema and metadata

We already have that. Dan Brickley and others put a lot of thoughtful effort into it <https://www.w3.org/TR/tabular-data-primer/#dialects>:

> A lot of what's called "CSV" that's published on the web isn't actually CSV. It might use something other than commas (such as tabs or semi-colons) as separators between values, or might have multiple header lines. [...] You can provide guidance to processors that are trying to parse those files through the `dialect` property

As is usually the case with standards, it's not that the standard doesn't exist but that people just don't even bother checking (much less caring about what it says or actually trying to follow it).




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: