Hacker News new | past | comments | ask | show | jobs | submit login
Guess what? Automated news doesn't quite work (techmeme.com)
29 points by raghus on Dec 3, 2008 | hide | past | favorite | 26 comments



My experience building http://esciencenews.com , which is automated news for science, tells me otherwise.

About the 'instantly obsolete' news - just put the latest of the cluster at the top. We see that a lot with shuttle missions to space : see http://esciencenews.com/sources/space.com/2008/11/12/space.s...

In the cluster ('More sources from other sites') you can see the mission`s coverage from start to finish.

As for bad grouping, just accept that you won't be able to find ALL stories belonging to a cluster and set your cutoff to accept the minimum of false positive while keeping the most stories belonging to a cluster.


Hey FiReaNG3L, I've been scratching my head about this for a while. Over on eScienceNews, some of your news items are full stories. Some are just summaries. When I follow your source links below the full stories I find sites offering summary-only RSS feeds. How are you getting the full versions of those stories? Are you scraping the HTML from the feed item URL?


The full versions are press releases we get from universities and the like. The summaries with links to full stories are copyrighted (or not, as some site editors do the same press release cut and paste as our automated service does).


What's the workflow look like on those releases? Do they e-mail them to you and get "sucked in?" Do they have a special feed? Do they fill in a form?


We scrape them, and the universities and other organizations are pretty happy about it, we often get email saying thanks, links from professors and emails asking how much views they received, etc. These would get no exposure at all without sites like e! science news.


Cool. One feature that I wish that you would add is RSS feeds for each sub-topic. Other than that it looks great.


There is already; http://esciencenews.com/syndication

I just noticed the topic pages still link to the main rss though, will change that as soon as I get back from the lab!


Your site is very well done. I will be returning soon.


First thing I thought about when I saw Gabe's post was your site and how well it works, even though it is fully automated. Your last point is probably the most important one, and in my book that's an acceptable trade off.

One question. What's the speed of updates? I know techmeme wants to be the place where you go immediately when a news story breaks out. Would that be a problem (only computational I suppose)?


Updates are every 5 minutes, which is plenty. Faster than that I don't see the point.


How do you do the clustering? Open source tool/software? Nice site!


http://search.cpan.org/~mdehoon/Algorithm-Cluster/perl/Clust...

Full write-up on Drupal.org, awesome CMS we used to build this : http://drupal.org/node/261340


And Google's recent introduction of voting on search results tells me that they might've reached a similar conclusion.


According to grumpy SEO people, PageRank and automated algorithms play a far smaller role. Manual human review by Google employees is often the only time a search engine spammer gets taken out.


Surely I'm not the only one who liked TechMeme exactly as it was and never saw a major issue with it. I think it's actually great just as it is.

Who runs into those issues where articles are bad? If I compare it to how Google News started out, this is lots better-- I have TechMeme bookmarked and only visit Google News when I'm search for news.


On the other hand, you can get hilarious mixed messages like this set of headlines on the front page right now: http://i35.tinypic.com/1z3cv1h.jpg

CNET: They're selling out? Business week: No one wants them any more? Apple 2.0: It's definitely one or the other!


I've been using Techmeme and other algorithmic editors for years, and I think TM is the best of the bunch. The Google Blog search competitor that was released a few months ago doesn't come close -- it's filled with spam and scraped content.

I'm curious to see how the human editor improves Techmeme.


I never realized that TechMeme was completely algorithmic. I assumed some of it was, but I thought editors were helping out as well. Color me very impressed!


I have been using Techmeme for quite sometime and like it a lot. Actually, I was inspired by Techeme's success to start our site: buzzup.com. Our content comes from three sources: users, human editor, and automatic program.


One could always set up a hot-or-not style site where people indicate if two news items are similar or not and have that input feed in somehow.


Or how about just automation + up and down voting?


I thought Techcrunch made a good point that even though it might only be a small change in what appears as news, it's a huge change fundamentally. I prefer Techmeme to stay fully automated.


Counter-argument: Google News.


Counter-counter-argument: http://articles.latimes.com/2008/sep/09/business/fi-moneyblo...

Google News has had issues, too (see link above in which Google News published a 6-year old story this past September). It's the nature of the beast: machines make mistakes. But people make mistakes, too. Screw ups are unavoidable.


Sure, but when you want to know what every other techhead is looking at, TechMeme is the only place to go.


That's why I don't read Techmeme. I don't want to know what every other techhead is looking at. I want to know what the mainstream techhead missing.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: