Hacker News new | past | comments | ask | show | jobs | submit login
Yahoo's hadoop-based Latent Dirichlet Allocation On Github (smola.org)
55 points by DanielRibeiro on June 9, 2011 | hide | past | favorite | 3 comments



This is very cool. If you want to play with topic modeling I'd suggest starting with Mallet (http://mallet.cs.umass.edu) For an idea of what topic modeling can do, check out http://bit.ly/wikitopics which provides a web-based UI for browsing through Wikileaks' Cablegate dump (powered by topic modeling/LDA/Mallet).


Vowpal Wabbit (one of the fastest learning implementations, from John Langford, also of Yahoo!) now has an "online" LDA implementation (Hoffman et al 2010: http://www.cs.princeton.edu/~blei/papers/HoffmanBleiBach2010...). It's probably the fastest single-core LDA implementation.

http://hunch.net/?p=1594

http://www.machinedlearnings.com/2010/12/lightning-fast-lda....





Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: