I have been using AWK at work and found it cumbersome for my use-case (run-once analysis/manipulation of 200GB csv datasets). I found out about Miller[1] a year or so back and have been using that instead. I don't know how it stacks up in terms of performance, but for my money, named arguments and one-shot statistics is all I need.
For example analysing the number of people per-year over multiple differently formatted files is as easy as `mlr uniq -f pid,year then count -g year`. It has filtering, very extensive manipulation capabilities and a nice documentation.
John has also reacted very fast on feature requests I had (so fast that I've not yet implemented using them).
+1 for Miller. It's a really slick tool for exploring large csv datasets. I've typically used it to do some prototyping and exploration of 200-500GB csv datasets before doing more hefty work in Java+PigLatin (our use-case is more long-term than just a single run for analysis, so that's re reason for moving out of just Miller). It's great to get a feel for the data and run some initial analysis before diving into the larger, more cumbersome system and tooling.
For example analysing the number of people per-year over multiple differently formatted files is as easy as `mlr uniq -f pid,year then count -g year`. It has filtering, very extensive manipulation capabilities and a nice documentation.
John has also reacted very fast on feature requests I had (so fast that I've not yet implemented using them).
[1]: https://miller.readthedocs.io/en/latest/#overview