I have been using AWK at work and found it cumbersome for my use-case (run-once ...

charris000 · on Oct 30, 2020

+1 for Miller. It's a really slick tool for exploring large csv datasets. I've typically used it to do some prototyping and exploration of 200-500GB csv datasets before doing more hefty work in Java+PigLatin (our use-case is more long-term than just a single run for analysis, so that's re reason for moving out of just Miller). It's great to get a feel for the data and run some initial analysis before diving into the larger, more cumbersome system and tooling.

Gravityloss · on Oct 30, 2020

This seems very interesting.

Making this reproducable and usable for multiple people has always been an annoyance in semi-ad-hoc data collection.