You need "gawk -M" for this for bignum support, so visited[$0]++ doesn't wrap ba...

zimpenfish · on May 29, 2019

> huge files with huge numbers of duplicates

At least on the stock MacOS awk, you can get up to 2^53 before arithmetic breaks (doesn't wrap, just doesn't go up any more which means the one-liner still works.)

    > echo '2^53-1' | bc
    9007199254740991
    > seq 1 10 | awk 'BEGIN{a[123]=9007199254740991;b=a[123]}{a[123]++}END{print a[123],b,a[123]-b}'
    9007199254740992 9007199254740991 1

Even with one character per line, you'd need an 18PB file before you got to this limit, afaict.