Nice work. I created something very similar with Python (https://github.com/leerob/facebook-data-analyzer). It looks like yours touches on some things I didn't get to in mine, like ranking messages. Awesome idea!
Hi, I'm the creator of Facebook Data Analyzer. I feel overwhelmed with support I got from community, we already fixed some issues for users to use script. Thank you hacker news.
It's worth noting that the messages exported from facebook with their tool are often truncated. It seems to be more comprehensive with your more recent contacts, so analysis will skew favorably to people you were in contact with most recently.
It looks like within the past 8 months or so that Facebook has changed to format of their data dumps to not truncate messages, as their previous data dumps were previously structured as one giant messages.htm file which would be difficult to parse and seems like it had missing data for certain cases.
I haven't seen truncation, but it is misleading the way it breaks up conversations. Instead of getting full threads, you'll get chunks of convos in chronological order which makes it a nightmare to follow anything.
Staying on top of these undocumented pseudo-formats is a real challenge. That's why it's a good idea to not wait to archive stuff.
When I wanted to analyze my Google Voice history of 9 years, all of the scripts to parse it didn't work anymore, so I had to write one: https://github.com/unqueued/googlevoiceparse
Google Takeout's HTML archives weren't exactly friendly when I wanted to drill down and find certain patterns.