Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Stanford Scientists Put Free Text-Analysis Tool on the Web (stanford.edu)
215 points by ninjin on Feb 6, 2014 | hide | past | favorite | 31 comments


The tool itself is hosted here: http://www.etcml.com/

The paper is here: http://hci.stanford.edu/publications/paper.php?id=279


Um, hey, I'm the author of the paper. I'll check this thread again in a few hours, in case you have questions about it.


You want to win some brownie points and have a bit of fun, you could run PG's essays through that thing and see what it makes of them:

http://paulgraham.com/articles.html


Neat project.

Will there be an API available? Or will I have to get creative with Form POSTs :).


There might eventually be an API available. Right now, we're focused on getting the actual grading interactions for the peers right. Richard's etcml already has an API.


+1 on that!


Did our publicity bring down the app? I can't pull it up.

Great idea though!


That's pretty cool — I had the exact same goals (help professors grade essays faster) with my bookshrink project [0]. Of course, it was the first python code I ever wrote and it uses an extremely naïve algorithm, but the results aren't too bad if you want to try it yourself [1].

[0]: https://github.com/peterldowns/bookshrink [1]: http://www.bookshrink.com/


> it uses an extremely naïve algorithm, but the results aren't too bad if you want to try it yourself

Don't be so hard on yourself. I review papers for CS conferences and just the fact that you used TF-IDF weighting puts you well above the average.


Regarding stuff like:

""" SPLIT INPUT INTO SENTENCES"""

god_awful_regex = r'''(?<!\d)(?<![A-Z]\.)(?<!\.[a-z]\.)(?<!\.\.\.)(?<!etc\.)(?<![Mm]r\.)(?<![Pp]rof\.)(?<![Dd]r\.)(?<![Mm]rs\.)(?<![Mm]s\.)(?<![Mm]z\.)(?<![Mm]me\.)(?:(?<=[.!?])|(?<=[.!?]['"]))[\s]+?(?=[\S])'''

Be advised that one of the nice things about python reg-exes is that they allow in-line comments (and naming of groups), if compiled with the verbose-flag:

    """ SPLIT INPUT INTO SENTENCES"""
    verbose_regex = r'''(?<!\d)  # I can't actually tell
      (?<![A-Z]\.)(?<!\.[a-z]\.) # what you're doing here...
      (?<!\.\.\.)(?<!etc\.)      # Is this one big group, or is
      (?<![Mm]r\.)(?<![Pp]rof\.) # it several groups, with
      (?<![Dd]r\.)(?<![Mm]rs\.)  # different prefixes?
      (?<![Mm]s\.)(?<![Mm]z\.)(?<![Mm]me\.) # Clearly, it's got something to do
      (?:(?<=[.!?])|(?<=[.!?]['"]))[\s]+?(?=[\S])''' # with
                 # not matching the dot at the end of Dr.
                 # or as part of an ellipsis as the end of a
                 # sentence? But my point was that if such a
                 # regex is built-up and tested with comments
                 # it can be quite readable

      god_awful_regex = re.compile(verbose_regex,
                                     re.VERBOSE)
      # continue here...
http://www.diveintopython.net/regular_expressions/verbose.ht...


Yeah, I've since learned that feature of Python :) Like I said, this is some of the first Python code I ever wrote; it definitely does not reflect my current knowledge.


Oh, to be clear, it wasn't meant as critique -- I just saw the aptly named variable, and thought it might be useful to highlight this aspect of python to others that might be new to the language. It's one of the few "special" parts of python I have personal experience with, as I worked on a small program that dealt with parsing emails, and being able to fully comment the reg-exp over several lines as I tweaked both it and the groups was very helpful :-)


You're absolutely right, and what I forgot to say in my earlier comment was "thank you!"


You might enjoy my tldr.js [0], you select the text you want to summarize and run the bookmarklet. It could use a little help in the parser department, but it was fun to write.

[0]: https://github.com/bumptech/tl-dr.js/blob/master/tldr.js


I did enjoy this, it looks like we took a very similar approach! Great idea to make it accessible as a bookmarklet and to let it run easily on certain sites like Wikipedia.


Other good text analysis tools from Stanford: http://nlp.stanford.edu/software/index.shtml


I did a short blog post on this service here: http://bicortex.com/twitter-data-sentiment-analysis-using-et...

It pretty good for what it does.


While the tool was still up, I did a quick search for #NSA -- and if I understood it correctly -- that the search field was tuned to give sensible sentiment analysis for tweets with the given hash-tag -- it didn't do a very good job. I can't verify now (as the site is down) but it seemed like it got about 50% wrong on that one...

Perhaps it does better with different hash-tags (there's a lot of bitter irony associated with #nsa -- possibly more than average for other tags) ?


Don't forget http://gate.ac.uk


I'm looking forward to the day that this technology is used for censorship; since everything political will need to sound positive, satire will rise again!


"Censorship is the mother of metaphor" - Borges


I don't see any source that I can download and run. Is this a web service? Aren't there other web services already that do this as a service?


Link appears to be down... :(


yep...


Worst possible time - and no Google cache. By tomorrow, 90% of the people who are interested would have forgotten the site, especially that they got some interest going on today. It seems to be an offshoot from Andrew Ng's team, which is trustworthy.


Lesson learned - ensure that your infrastructure can handle your times of peak interest before launching your product.


I heard this analogy when I was at Amazon: the internet is like a party where your worst fear is not that nobody will show up, but that EVERYBODY will show up.

They used it as a marketing pitch for AWS/EC2


Hard to do when you don't know how many visitors you'll have to handle...


Apparently this is done with Ruby/Rails... any chance this is going to find its way onto Github?


Dude, this is awesome! Thanks kulkarnic.

Username response was here: http://www.etcml.com/jobs/8188 The only thing I would add is an overall score for how positive/negative/neutral a text is.


It has a hard time with phrases like "I want to do X so bad"...

http://www.etcml.com/jobs/8354




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: