The Reddit data set on BigQuery is excellent. My side project is tangentially related to the fact that the Reddit data set has normal folk commenting. I have been using Reddit comments to help writers research and find what normal people say about any topic [1]. So far, I have had little luck in incorporating the comment scores and coming up with something more useful than the standard bag of words search techniques[2]. I am currently working on making a more interesting/creative writing prompts ... again based on the Reddit data set.
One problem for data geeks to solve: Reddit data fits nicely into a graph structure and not so nicely in table form. It would be fantastic if someone put the Reddit data set into a graphdb and made it open.
One problem for data geeks to solve: Reddit data fits nicely into a graph structure and not so nicely in table form. It would be fantastic if someone put the Reddit data set into a graphdb and made it open.
[1]https://wisdomofreddit.com and https://github.com/qxf2/wisdomofreddit
[2]For now, my search engine currently just uses Whoosh's (out of the box) BM25F.