Considering the non-standard nature of CSV, quoting throughput numbers in bytes ...

liuliu · 2025-05-09T16:16:44 1746807404

CSV is standardized in RFC 4180 (well, as standardized as most of what we considered internet "standard").

Otherwise agree, if you don't do escaping (a.k.a. "quoting", the same thing for CSV), you are not implementing it correctly. For example, if you quote a line break, in RFC 4180, this line break will be in that quoted string, but if you don't need to handle that, you can implement CSV parsing much faster (proper handling line break with quoted string requires 2-pass approach (if you are going to use many-core) while not handling it at all can be done with 1-pass approach). I discussed about this detail in https://liuliu.me/eyes/loading-csv-file-at-the-speed-limit-o...

a3w · 2025-05-09T16:20:47 1746807647

Side note: RFCs are great standards, as they are readable.

As an example of how not to do it: XML can be assumed a standard, but I cannot afford to read it. DIN/ISO is great for manufacturing in theory, but bad for zero-cost of initial investment like IT.