How? just get started working on a fun problem. A good place to start is keyword...

garysieling · on Jan 20, 2017

Another fun thing is to paste article text into some API, like the Watson demo, so you can see what kinds of things are possible:

https://alchemy-language-demo.mybluemix.net/

I played around with this a bit to develop https://www.findlectures.com, so knowing what works/doesn't work there I'm developing some NLP scripts to support my use cases.

demonshalo · on Jan 21, 2017

I never thought about this particular use-case. The subtitle for TED talks should be an ocean of info for you to extract keywords from :D Pretty neat site you got there. I will be using it. Thanks!

jventura · on Jan 21, 2017

I would say that a good example for starting in this field would be to implement something like Tf-Idf [0] for identifying keywords on a set of documents. I don't know where one can find current datasets for this, but I made WikiCorpusExtractor [1] to build sets of documents from the Wikipedia.

The only thing one really needs is to count the frequency of words in each document and do very simple math. Tf-Idf is still very relevant today and provides you with a very good idea on how statistics is used on text-mining.

[0] https://en.wikipedia.org/wiki/Tf%E2%80%93idf

[1] https://github.com/joaoventura/WikiCorpusExtractor

demonshalo · on Jan 21, 2017

I started even simpler than that. I started by just eliminating stopwords and count the frequency in each word in the document itself. I did not use a set of documents as the goal was for the algorithm to be used on the spot for a single block of text.

A few months later and after many iterations + a whole lot of testing, the algorithm now can extract super relevant keywords 90%+ of the time!

I wish I knew about the WikiCorpusExtractor. Thanks for the link!

bosie · on Jan 20, 2017

Thank you very much for the reading material

demonshalo · on Jan 21, 2017

u welcome!