Or perhaps better, if needs change the seen = set() object can be swapped out for any alternative object seen = foo that provides foo.__contains__ and foo.add methods.
This could involve saving previously seen lines in a radix tree, adding multiple layers of caching, saving infrequently seen lines to disk or over the network, etc. as appropriate for the use case.
This could involve saving previously seen lines in a radix tree, adding multiple layers of caching, saving infrequently seen lines to disk or over the network, etc. as appropriate for the use case.