I think you're right; there's immense potential value in news aggregation that goes beyond "this got the most eyeballs / articles".
What do you think of the possibility of ranking articles by some metric? If there are 30 or 300 articles, I don't want to do the work of figuring out which version I want (which may be something like longest, or shortest, or without embedded video).
I've given this a lot of thought. Even to the extent of trying to understand what an alternative twitter would look like were it possible to somhow classify a tweet based on its legality in a region and thus giving the user the option to see only tweets which are 'legal' in their jurisdiction. This is a tough nut to crack.
> If there are 30 or 300 articles, I don't want to do the work of figuring out which version I want (which may be something like longest, or shortest, or without embedded video).
It is one thing to have to trawl through 30 articles and figure out which one has the content which is both most accurate at describing the truth and is covering all aspects of the situtation. And it is another thing to filter articles out for having certain attributes.
The first is more difficult. The second is easier and a good feature to add. As the articles are added to the database it's possible to obtain the content and parse it for references to those particular attributes. I'm actually starting work on one at the moment, adverts.
We know that the current model on the internet is to get eyes on an adverts. Not articles. Adverts. And, when we see an outrageous headline "Billie Eilish shaves off own hair in outburst!", we're going to open that page to see what's going on. And surprise surprise. Ads all over the place. I'll be using the number of ads as a metric and in order to provide the user with a warning as to the reliability of the information on the page "This page has more than X adverts. Please be aware that the page may have been designed to obtain your attention and the information within may not be accurate".
As I said in the original comment, I'm going to expand this in such a way that it is beneficial to users to identify "healthy" articles where "healthy" means: few ads, representative of reality (true) and with multiple aspects of the news event.
There's probably something to be said for human editorial in ranking sources. There are a finite number of actual news sources with actual reporting, and it occurs to me that your once your automated filter detects possible crap quality, it could kick it up to a human for review and subsequent permanent deranking.
This is one of Google's problems: they are monomaniacal about algorithmic ranking. And that's got them very far on the open web, where the number of sources is too wide and varied for human review. But while news has the appearance of being a wide domain, the majority of what's out there is regurgitated content on sites that could safely be branded with a permanent mark of distrust.
Meanwhile metrics like # of ads could incorrectly derank sites that are just run under a poor business model, but do employ good journalists / analysts / whatever.
What do you think of the possibility of ranking articles by some metric? If there are 30 or 300 articles, I don't want to do the work of figuring out which version I want (which may be something like longest, or shortest, or without embedded video).