I would love to have a command-line tool that reads CSV and has a ton of features to cover different quirks and errors, which can output cleaner formats that I can pipe into other command-line tools.
csvkit [0] might be that tool; I discovered it after my last painful encounter with CSV files and haven't used it in anger yet. Among other things, it translates CSV to JSON, so you can compose it with jq.
I love these TSV utilities: https://github.com/eBay/tsv-utils Granted they're for "T"sv files, not "C"sv, but there's a handy `csv2tsv` utility included.
At my last employer, I built a filter program, creatively called CSVTools[0], to do something like this. One piece of the project parses CSVs and replaces the commas/newlines (in an escaping- and multiline-aware manner, of course) with ASCII record/unit separator characters[1] (0x1E and 0x1F); the other piece converts that format back into well-formed CSV files. I usually used this with GNU awk, and reconfigured RS[2] and FS[3] appropriately. Or you can just set the input separators (IRS/IFS) and produce plaintext output from AWK.
Good idea! Looks similar to something I wrote called csvquote https://github.com/dbro/csvquote , which enables awk and other command line text tools to work with CSV data that contains embedded commas and newlines.
csvtool is also nice.[0][1] csvkit is very flexible and can certainly be used in anger, but is a bit finicky; you almost always want to use the -I (--no-inference) option. Additionally, I wrote a tiny Perl script for quick awk-like oneliners.[2]
> As of version 2.0.9, there's no need for any external dependency. Python itself (3.7), and any needed libraries are self-contained inside the installation, isolated from the rest of your system.
csvkit [0] might be that tool; I discovered it after my last painful encounter with CSV files and haven't used it in anger yet. Among other things, it translates CSV to JSON, so you can compose it with jq.
[0] https://csvkit.readthedocs.io/en/latest/index.html