In the cluster ('More sources from other sites') you can see the mission`s coverage from start to finish.
As for bad grouping, just accept that you won't be able to find ALL stories belonging to a cluster and set your cutoff to accept the minimum of false positive while keeping the most stories belonging to a cluster.
Hey FiReaNG3L, I've been scratching my head about this for a while. Over on eScienceNews, some of your news items are full stories. Some are just summaries. When I follow your source links below the full stories I find sites offering summary-only RSS feeds. How are you getting the full versions of those stories? Are you scraping the HTML from the feed item URL?
The full versions are press releases we get from universities and the like. The summaries with links to full stories are copyrighted (or not, as some site editors do the same press release cut and paste as our automated service does).
We scrape them, and the universities and other organizations are pretty happy about it, we often get email saying thanks, links from professors and emails asking how much views they received, etc. These would get no exposure at all without sites like e! science news.
First thing I thought about when I saw Gabe's post was your site and how well it works, even though it is fully automated. Your last point is probably the most important one, and in my book that's an acceptable trade off.
One question. What's the speed of updates? I know techmeme wants to be the place where you go immediately when a news story breaks out. Would that be a problem (only computational I suppose)?
According to grumpy SEO people, PageRank and automated algorithms play a far smaller role. Manual human review by Google employees is often the only time a search engine spammer gets taken out.
Surely I'm not the only one who liked TechMeme exactly as it was and never saw a major issue with it. I think it's actually great just as it is.
Who runs into those issues where articles are bad? If I compare it to how Google News started out, this is lots better-- I have TechMeme bookmarked and only visit Google News when I'm search for news.
I've been using Techmeme and other algorithmic editors for years, and I think TM is the best of the bunch. The Google Blog search competitor that was released a few months ago doesn't come close -- it's filled with spam and scraped content.
I'm curious to see how the human editor improves Techmeme.
I never realized that TechMeme was completely algorithmic. I assumed some of it was, but I thought editors were helping out as well. Color me very impressed!
I have been using Techmeme for quite sometime and like it a lot. Actually, I was inspired by Techeme's success to start our site: buzzup.com. Our content comes from three sources: users, human editor, and automatic program.
I thought Techcrunch made a good point that even though it might only be a small change in what appears as news, it's a huge change fundamentally. I prefer Techmeme to stay fully automated.
Google News has had issues, too (see link above in which Google News published a 6-year old story this past September). It's the nature of the beast: machines make mistakes. But people make mistakes, too. Screw ups are unavoidable.
About the 'instantly obsolete' news - just put the latest of the cluster at the top. We see that a lot with shuttle missions to space : see http://esciencenews.com/sources/space.com/2008/11/12/space.s...
In the cluster ('More sources from other sites') you can see the mission`s coverage from start to finish.
As for bad grouping, just accept that you won't be able to find ALL stories belonging to a cluster and set your cutoff to accept the minimum of false positive while keeping the most stories belonging to a cluster.