Very cool. Here's a similar ranking that takes average scores into account. It surfaces a lot of the smaller domains that get a lot of HN love: https://hnleaderboard.com
This sort of thing is an excellent seed for host authority if you're making a search engine. Often the trick is to find places where humans have curated 'good' web sites out of the sea of all possible web sites. HN is a good source of data because you get several human signals, one is up votes, the other is comment points. Between the two you can create a rank for both how non-spammy it is and how controversial it is.
Thanks for that reminder (I was going to say: “I wish I could sort this by average or something instead”).
Neither are perfect though. A lot of the best articles come from the BBC. Yet because so many people submit (and resubmit) BBC articles, they have an awful per-posting score.
I think you’d want to pre-filter by submissions that actually got some traction (made it to the front page?) and then look at the score distribution.
Is this just cumulative? It seems to me that ordering by average score would be better, is Github simply pushed to the top for being low rank high volume?
This seems a frequent problem. What is the most robust formula for sorting by average while boosting by frequency? Something like avg(ratings) * log(len(ratings)) ? Maybe that curve needs to be tweaked based on the use case ?
I wish sites like amazon had something like this, since sorting by average rating is completely useless if you have a long tail.
To me, the interesting number is average points per submission. It's surprising to see how badly medium.com, forbes.com, and theregister.com do by that metric, and how well stripe.com, ifixit.com, and blog.rust-lang.org do...
Shout out to danluu.com — literally the top average score for the last year and in the top few for last three years (per https://hnleaderboard.com/ ). Probably my favorite blogger and it seems popular with HN as well.
Hello - I created the site. I would encourage you to take a look at the 'links' section which is something you probably have not seen before. It aggregates and comment urls and ranks by count for a variety of sites - including XKCD!:
Very interesting and well done. Something seems off for the 'ask' page though. "Top Ask HN stories from the past year" but most only have a few points.
Well done! Did you consider using other metrics (like median instead of mean) or other techniques (e.g. removing outliers, calculating confidence intervals, considering the standard deviation, etc.)? I'm aware that things can become complex (to visualize and interpret) and maybe not many people would be interested in something more complex than the mean. :)
Does this have any meaning to anyone? As someone that enjoys submitting articles and trying to find patterns in their success or failure, I do not see much significance in where the story came from as opposed to a catchy title or subject. I'm often surprised by what catches and what flops for what that's worth.
Hello. I created the site. I think it can be useful to browse domains for certain sites. For example, Github.com can be used to browse popular projects. aws.amazon.com can be browsed to keep up with large AWS announcements, and www.reddit.com can be used to see top posts on reddit.
Why did I create this site? Because it can be very difficult to keep up with tech news. There are great new stories on HN every single day. I simply can't keep up. The solution? A way to browse top stories from the past week/month/year or all time. I am letting the wisdom of the crowds decide for me what is most important, but it is nice to be able to take a vacation without worrying I will miss something big. I still read HN and other tech sites almost daily, but this helps me review the more popular stories which can be both interesting and fun.
I find it surprising that nytimes takes 2nd overall, top in news category. It means nytimes is doing things better than others, and the news industry is very motivated to figure out how to make digital work.
I am surprised by wikipedia’s performance. How does an encyclopedia (even and excellent and ever evolving one) beat out the Washington Post for topical news?
The wikipedia links submitted that make the front pages are almost automatic reads for me. They aren't news so much as obscurities, and are usually pretty fascinating.
I agree that they can be good. It’s just that whatever someone will surface in 2020 was probably sitting there in 2012. Why didn’t we look at it in 2012? Surely there is a listing of obscure topics. But somehow it becomes a must read in the HN context. (I do it too)
In terms of product organisations, the top ones on the list is Mozilla, AWS, Cloudflare then Apple ... wouldn’t have guessed that (part from maybe apple)
It oughtn't be. You might not like their opinion pages -- yes, they're typically left-wing -- but The Guardian's reporting is absolutely top notch; amongst the best in the world.
No. They constantly blur editorial features and news. The most shared posts are ludicrous hysterical nutters that they give a platform to because it generates clicks.
How do you differentiate between opinionated/politically biased pieces and general reporting ? What's the criteria for evaluating good (ie trustworthy) reporting ? Curious to know.