Stanford Scientists Put Free Text-Analysis Tool on the Web

peterldowns · on Feb 6, 2014

The tool itself is hosted here: http://www.etcml.com/

The paper is here: http://hci.stanford.edu/publications/paper.php?id=279

kulkarnic · on Feb 6, 2014

Um, hey, I'm the author of the paper. I'll check this thread again in a few hours, in case you have questions about it.

davidw · on Feb 6, 2014

You want to win some brownie points and have a bit of fun, you could run PG's essays through that thing and see what it makes of them:

http://paulgraham.com/articles.html

stevenrace · on Feb 6, 2014

Neat project.

Will there be an API available? Or will I have to get creative with Form POSTs :).

kulkarnic · on Feb 8, 2014

There might eventually be an API available. Right now, we're focused on getting the actual grading interactions for the peers right. Richard's etcml already has an API.

hoprocker · on Feb 7, 2014

+1 on that!

mathattack · on Feb 6, 2014

Did our publicity bring down the app? I can't pull it up.

Great idea though!

peterldowns · on Feb 6, 2014

That's pretty cool — I had the exact same goals (help professors grade essays faster) with my bookshrink project [0]. Of course, it was the first python code I ever wrote and it uses an extremely naïve algorithm, but the results aren't too bad if you want to try it yourself [1].

[0]: https://github.com/peterldowns/bookshrink [1]: http://www.bookshrink.com/

yid · on Feb 6, 2014

> it uses an extremely naïve algorithm, but the results aren't too bad if you want to try it yourself

Don't be so hard on yourself. I review papers for CS conferences and just the fact that you used TF-IDF weighting puts you well above the average.

e12e · on Feb 7, 2014

Regarding stuff like:

""" SPLIT INPUT INTO SENTENCES"""

god_awful_regex = r'''(?<!\d)(?<![A-Z]\.)(?<!\.[a-z]\.)(?<!\.\.\.)(?<!etc\.)(?<![Mm]r\.)(?<![Pp]rof\.)(?<![Dd]r\.)(?<![Mm]rs\.)(?<![Mm]s\.)(?<![Mm]z\.)(?<![Mm]me\.)(?:(?<=[.!?])|(?<=[.!?]['"]))[\s]+?(?=[\S])'''

Be advised that one of the nice things about python reg-exes is that they allow in-line comments (and naming of groups), if compiled with the verbose-flag:

    """ SPLIT INPUT INTO SENTENCES"""
    verbose_regex = r'''(?<!\d)  # I can't actually tell
      (?<![A-Z]\.)(?<!\.[a-z]\.) # what you're doing here...
      (?<!\.\.\.)(?<!etc\.)      # Is this one big group, or is
      (?<![Mm]r\.)(?<![Pp]rof\.) # it several groups, with
      (?<![Dd]r\.)(?<![Mm]rs\.)  # different prefixes?
      (?<![Mm]s\.)(?<![Mm]z\.)(?<![Mm]me\.) # Clearly, it's got something to do
      (?:(?<=[.!?])|(?<=[.!?]['"]))[\s]+?(?=[\S])''' # with
                 # not matching the dot at the end of Dr.
                 # or as part of an ellipsis as the end of a
                 # sentence? But my point was that if such a
                 # regex is built-up and tested with comments
                 # it can be quite readable

      god_awful_regex = re.compile(verbose_regex,
                                     re.VERBOSE)
      # continue here...

http://www.diveintopython.net/regular_expressions/verbose.ht...

peterldowns · on Feb 7, 2014

Yeah, I've since learned that feature of Python :) Like I said, this is some of the first Python code I ever wrote; it definitely does not reflect my current knowledge.

e12e · on Feb 7, 2014

Oh, to be clear, it wasn't meant as critique -- I just saw the aptly named variable, and thought it might be useful to highlight this aspect of python to others that might be new to the language. It's one of the few "special" parts of python I have personal experience with, as I worked on a small program that dealt with parsing emails, and being able to fully comment the reg-exp over several lines as I tweaked both it and the groups was very helpful :-)

peterldowns · on Feb 8, 2014

You're absolutely right, and what I forgot to say in my earlier comment was "thank you!"

magicseth · on Feb 6, 2014

You might enjoy my tldr.js [0], you select the text you want to summarize and run the bookmarklet. It could use a little help in the parser department, but it was fun to write.

[0]: https://github.com/bumptech/tl-dr.js/blob/master/tldr.js

peterldowns · on Feb 7, 2014

I did enjoy this, it looks like we took a very similar approach! Great idea to make it accessible as a bookmarklet and to let it run easily on certain sites like Wikipedia.

rahimnathwani · on Feb 6, 2014

Other good text analysis tools from Stanford: http://nlp.stanford.edu/software/index.shtml

jonsy0999 · on Feb 6, 2014

I did a short blog post on this service here: http://bicortex.com/twitter-data-sentiment-analysis-using-et...

It pretty good for what it does.

e12e · on Feb 7, 2014

While the tool was still up, I did a quick search for #NSA -- and if I understood it correctly -- that the search field was tuned to give sensible sentiment analysis for tweets with the given hash-tag -- it didn't do a very good job. I can't verify now (as the site is down) but it seemed like it got about 50% wrong on that one...

Perhaps it does better with different hash-tags (there's a lot of bitter irony associated with #nsa -- possibly more than average for other tags) ?

spdegabrielle · on Feb 6, 2014

Don't forget http://gate.ac.uk

mrgrieves · on Feb 6, 2014

I'm looking forward to the day that this technology is used for censorship; since everything political will need to sound positive, satire will rise again!

ropz · on Feb 7, 2014

"Censorship is the mother of metaphor" - Borges

infocollector · on Feb 7, 2014

I don't see any source that I can download and run. Is this a web service? Aren't there other web services already that do this as a service?

j2kun · on Feb 6, 2014

Link appears to be down... :(

prht · on Feb 6, 2014

yep...

visarga · on Feb 6, 2014

Worst possible time - and no Google cache. By tomorrow, 90% of the people who are interested would have forgotten the site, especially that they got some interest going on today. It seems to be an offshoot from Andrew Ng's team, which is trustworthy.

BlackDeath3 · on Feb 6, 2014

Lesson learned - ensure that your infrastructure can handle your times of peak interest before launching your product.

j2kun · on Feb 9, 2014

I heard this analogy when I was at Amazon: the internet is like a party where your worst fear is not that nobody will show up, but that EVERYBODY will show up.

They used it as a marketing pitch for AWS/EC2

stavros · on Feb 7, 2014

Hard to do when you don't know how many visitors you'll have to handle...

dexterchief · on Feb 6, 2014

Apparently this is done with Ruby/Rails... any chance this is going to find its way onto Github?

mceoin · on Feb 7, 2014

Dude, this is awesome! Thanks kulkarnic.

Username response was here: http://www.etcml.com/jobs/8188 The only thing I would add is an overall score for how positive/negative/neutral a text is.

joeguilmette · on Feb 7, 2014

It has a hard time with phrases like "I want to do X so bad"...

http://www.etcml.com/jobs/8354