For me, I use R data.table a lot and I see as the main advantages are performance and the terse syntax. The terse syntax does come with a steep learning curve though.
Indeed, data.table is just awesome for productivity. When you're manipulating data for exploration you want the least number of keystrokes to bring an idea to life and data.table gives you that.
The syntax isn't self-describing and uses lots of abbreviations; it relies on some R magic that I found confusing when learning (unquoted column names and special builtin variables); and data.table is just a different approach to SQL and other dataframe libraries.
But once you get used to it data.table makes a lot of sense: every operation can be broken down to filtering/selecting, aggregating/transforming, and grouping/windowing. Taking the first two rows per group is a mess in SQL or pandas, but is super simple in data.table
flights[, head(.SD, 2), by = month]
That data.table has significantly better performance than any other dataframe library in any language is a nice bonus!
Not only is does this have all the same keywords, but it is organized in a much clearer way to newcomers and labels things to look up in the API. Whereas your R code has a leading comma, .SD, and a mix of quotes and non-quotes for references to columns. You even admit the last was confusing to learn. This can all be crammed in your head, but not what I would call thoughtfully designed.
Anyway, I don't understand why terseness is even desirable. We're doing DS and ML, no project never comes down to keystrokes but ability to search the docs and debug does matter.
It helps in quickly improving your understanding of the data by being able to answer simple but important questions quicker. In this contrived example I would want to know:
- How many events by type
- When did they happen
- Are there any breaks in the count, why?
- Some statistics on these events like average, min, max