CSV is a superb, incredibly useful data format.. but not perfect or complete.
Instead of breaking CSV by adding to it .. I recommend augmenting it :
It would be useful to have a good standardized / canonical json format for things like encoding, delimiter, schema and metadata, to accompany a zipped csv file, perhaps packaged in the same archive.
Gradually datasets would become more self-documenting and machine-usable without wrangling.
> A lot of what's called "CSV" that's published on the web isn't actually CSV. It might use something other than commas (such as tabs or semi-colons) as separators between values, or might have multiple header lines. [...] You can provide guidance to processors that are trying to parse those files through the `dialect` property
As is usually the case with standards, it's not that the standard doesn't exist but that people just don't even bother checking (much less caring about what it says or actually trying to follow it).
Instead of breaking CSV by adding to it .. I recommend augmenting it :
It would be useful to have a good standardized / canonical json format for things like encoding, delimiter, schema and metadata, to accompany a zipped csv file, perhaps packaged in the same archive.
Gradually datasets would become more self-documenting and machine-usable without wrangling.