Hacker News new | past | comments | ask | show | jobs | submit login

> huge files with huge numbers of duplicates

At least on the stock MacOS awk, you can get up to 2^53 before arithmetic breaks (doesn't wrap, just doesn't go up any more which means the one-liner still works.)

    > echo '2^53-1' | bc
    9007199254740991
    > seq 1 10 | awk 'BEGIN{a[123]=9007199254740991;b=a[123]}{a[123]++}END{print a[123],b,a[123]-b}'
    9007199254740992 9007199254740991 1
Even with one character per line, you'd need an 18PB file before you got to this limit, afaict.



Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: