The basic imperative Python version is much easier to remember and read though, ...

skinner_ · on May 29, 2019

One advantage is that

  seen.add(line)

can be changed to

  seen.add(hash(line))

which can be significantly more memory efficient for files with long lines.

jacobolus · on May 29, 2019

Or perhaps better, if needs change the seen = set() object can be swapped out for any alternative object seen = foo that provides foo.__contains__ and foo.add methods.

This could involve saving previously seen lines in a radix tree, adding multiple layers of caching, saving infrequently seen lines to disk or over the network, etc. as appropriate for the use case.