Hacker News new | past | comments | ask | show | jobs | submit login

Isn't this exactly what quoting solves?

i.e.: ``` "1,20","2,3",hello "2,40","4,6",goodbye ```

If your tool reads CSV by doing `string_split(',', line);`, your tool is doing it wrong. There's a bunch of nuance and shit, which can make CSVs interesting to work with, but storing a comma in a field is a pretty solved issue if the tool in question has more than 5 minutes thought put into it.




Now all your numbers are strings


It's a text file. All your numbers were already strings. Nothing has changed.


There's a difference between "1" and 1. When you import a csv and try to do maths on a "number" you won't get the expected result. Some importers won't even allow you to specify that "number" columns are numbers, they'll outright fail and force you to say it's a string, or you'll have to specify which columns are "numbers" and map the strings to numbers on the importer side.

If they are numbers to begin with (not "numbers"), you can just import the csv and you'll get the expected result out of the box.

In the end things are just 1s and 0s, but that doesn't mean we only ever do binary operations on data at the abstraction layer we humans operate, so saying it's just 0s and 1s or just strings is not very smart.


Sounds to me like you're using a shitty parser. CSV is schemaless. It is up to you to tell the parser what types to use if it isn't unambiguous. Quoted values can be numbers, and unquoted values can be strings. I have not used any CSV tools that don't support this behaviour.


A shitty parser is one that assumes, if I quite a number I want it to be a string.


But now all our strings might be numbers! We now have to parse every quoted string, and we can no longer represent numbers as text.

    unquoted input:
    0, 10, "Text", "123"

    unambiguous output:
    (Num) 0, (Num) 10, (Text) Text, (Text) 123

    quoted input:
    "0", "10", "Text", "123"

    output :
    (Num) 0, (Num) 10, (Text) Text, (Num) 123


Why would you quote Text in the first example? That makes no sense. Text does not contain any delimiters or special characters.

Unquoted input should look like this:

unquoted input: 0,10,Text,123

Note the absence of spaces as well. Not sure what flavour of CSV you are using, but there usually aren't spaces after the delimiter.


Fair point, but that doesn't really resolve the issue. Here's a cleaned up example showing the same problem:

    unquoted input:
    0,10,"Text,","123"

    output:
    (Num) 0
    (Num) 10
    (Text) Text,
    (Text) 123

    quoted input:
    "0","10","Text,","123"

    output:
    (Num) 0
    (Num) 10
    (Text) Text,
    (Num) 123




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: