Guess what? Automated news doesn't quite work

FiReaNG3L · on Dec 3, 2008

My experience building http://esciencenews.com , which is automated news for science, tells me otherwise.

About the 'instantly obsolete' news - just put the latest of the cluster at the top. We see that a lot with shuttle missions to space : see http://esciencenews.com/sources/space.com/2008/11/12/space.s...

In the cluster ('More sources from other sites') you can see the mission`s coverage from start to finish.

As for bad grouping, just accept that you won't be able to find ALL stories belonging to a cluster and set your cutoff to accept the minimum of false positive while keeping the most stories belonging to a cluster.

brandnewlow · on Dec 4, 2008

Hey FiReaNG3L, I've been scratching my head about this for a while. Over on eScienceNews, some of your news items are full stories. Some are just summaries. When I follow your source links below the full stories I find sites offering summary-only RSS feeds. How are you getting the full versions of those stories? Are you scraping the HTML from the feed item URL?

FiReaNG3L · on Dec 4, 2008

The full versions are press releases we get from universities and the like. The summaries with links to full stories are copyrighted (or not, as some site editors do the same press release cut and paste as our automated service does).

brandnewlow · on Dec 4, 2008

What's the workflow look like on those releases? Do they e-mail them to you and get "sucked in?" Do they have a special feed? Do they fill in a form?

FiReaNG3L · on Dec 4, 2008

We scrape them, and the universities and other organizations are pretty happy about it, we often get email saying thanks, links from professors and emails asking how much views they received, etc. These would get no exposure at all without sites like e! science news.

codeismightier · on Dec 3, 2008

Cool. One feature that I wish that you would add is RSS feeds for each sub-topic. Other than that it looks great.

FiReaNG3L · on Dec 3, 2008

There is already; http://esciencenews.com/syndication

I just noticed the topic pages still link to the main rss though, will change that as soon as I get back from the lab!

aaronblohowiak · on Dec 3, 2008

Your site is very well done. I will be returning soon.

bbgm · on Dec 4, 2008

First thing I thought about when I saw Gabe's post was your site and how well it works, even though it is fully automated. Your last point is probably the most important one, and in my book that's an acceptable trade off.

One question. What's the speed of updates? I know techmeme wants to be the place where you go immediately when a news story breaks out. Would that be a problem (only computational I suppose)?

FiReaNG3L · on Dec 4, 2008

Updates are every 5 minutes, which is plenty. Faster than that I don't see the point.

shafqat · on Dec 4, 2008

How do you do the clustering? Open source tool/software? Nice site!

FiReaNG3L · on Dec 4, 2008

http://search.cpan.org/~mdehoon/Algorithm-Cluster/perl/Clust...

Full write-up on Drupal.org, awesome CMS we used to build this : http://drupal.org/node/261340

waleedka · on Dec 3, 2008

And Google's recent introduction of voting on search results tells me that they might've reached a similar conclusion.

henning · on Dec 3, 2008

According to grumpy SEO people, PageRank and automated algorithms play a far smaller role. Manual human review by Google employees is often the only time a search engine spammer gets taken out.

thorax · on Dec 3, 2008

Surely I'm not the only one who liked TechMeme exactly as it was and never saw a major issue with it. I think it's actually great just as it is.

Who runs into those issues where articles are bad? If I compare it to how Google News started out, this is lots better-- I have TechMeme bookmarked and only visit Google News when I'm search for news.

GavinB · on Dec 4, 2008

On the other hand, you can get hilarious mixed messages like this set of headlines on the front page right now: http://i35.tinypic.com/1z3cv1h.jpg

CNET: They're selling out? Business week: No one wants them any more? Apple 2.0: It's definitely one or the other!

ilamont · on Dec 3, 2008

I've been using Techmeme and other algorithmic editors for years, and I think TM is the best of the bunch. The Google Blog search competitor that was released a few months ago doesn't come close -- it's filled with spam and scraped content.

I'm curious to see how the human editor improves Techmeme.

bprater · on Dec 3, 2008

I never realized that TechMeme was completely algorithmic. I assumed some of it was, but I thought editors were helping out as well. Color me very impressed!

zena · on Dec 3, 2008

I have been using Techmeme for quite sometime and like it a lot. Actually, I was inspired by Techeme's success to start our site: buzzup.com. Our content comes from three sources: users, human editor, and automatic program.

brandnewlow · on Dec 3, 2008

One could always set up a hot-or-not style site where people indicate if two news items are similar or not and have that input feed in somehow.

seaucre · on Dec 3, 2008

Or how about just automation + up and down voting?

crsmith · on Dec 4, 2008

I thought Techcrunch made a good point that even though it might only be a small change in what appears as news, it's a huge change fundamentally. I prefer Techmeme to stay fully automated.

ig1 · on Dec 3, 2008

Counter-argument: Google News.

catone · on Dec 4, 2008

Counter-counter-argument: http://articles.latimes.com/2008/sep/09/business/fi-moneyblo...

Google News has had issues, too (see link above in which Google News published a 6-year old story this past September). It's the nature of the beast: machines make mistakes. But people make mistakes, too. Screw ups are unavoidable.

bprater · on Dec 3, 2008

Sure, but when you want to know what every other techhead is looking at, TechMeme is the only place to go.

AndrewWarner · on Dec 3, 2008

That's why I don't read Techmeme. I don't want to know what every other techhead is looking at. I want to know what the mainstream techhead missing.