Hacker News new | past | comments | ask | show | jobs | submit login
Ask PG: Statistics
14 points by tel on Feb 23, 2008 | hide | past | favorite | 20 comments
There have been a few posts (one being mine) around concerning fears about "fluff" posts taking over. Solutions are ranging from allowing downmods, adjusting weighting algorithms, and even just blacklisting Reddit posts.

One thing I'd like to see before considering adjustments like this would be more detailed statistics about how people vote, how leaders vote, how karma is distributed, &c.

I feel like interesting data sets would be:

   Across all users
     Karma
     Number of votes
     Number of submissions
     Times downmodded
     ...

   Across all posts
     Karma
     Comments
     Flag for whether it's considered "fluffy" (mod's discretion)
     Flag for Reddit submission
     ...
Really, it could be a pretty big problem. It does seem like a potential playground of hacks and at least some of those statistics are certainly not difficult to mine (probably Arc one-liners).

So, if it seems interesting and there's not any issue with privacy (strip out usernames and it'd be hard to correlate anything beyond the leaderboard) is there any chance of seeing a hn-stats tarball?




Anyone so concerned with stats that does not work for HN is just looking for more places to waste their time instead of working on their hack/startup. This isn't your community; it's a website. You may think of it as your community but where will you be two years from now? Still wasting time on this site asking for stuff like this? No. Google "This too shall pass" for it is the lesson of life. Really - the post today, "Do It Fucking Now", applies here as well lol. Figure out what's important and do that; having stats on this site is not important in your or my life.


Your objection here doesn't make any sense. Of course people could be spending more time doing work and less time on news.yc. But asking for stats isn't any better or worse than any other use of this site.

Obviously this site isn't a community to you, but it is to some people. It's also entertainment. I'd rather have stats on this website than watch whatever is on network television right now.


Some of us aren't interested in the numbers but we are interested in the patterns behind them, especially if they could be used to identify destructive trends.


Organizations that think like this:

DHS

FBI

CIA

DOD

KGB

Thought Police

Please don't add YC.News to this list. It's one of the few places I enjoy on the net at the moment (and of those few, this is my favorite). I don't like the notion of having some sort policing activity to identify "destructive trends" as you call it.

Sorry for putting it so bluntly.


Having observed Digg and Reddit devolve, solving this problem has nothing to do with conspiracy theories. This is more about addressing Eternal September concerns (brilliant reference by someone yesterday). History will repeat itself until a clever, llama-free community ahem addresses these problems intelligently. I can't think of a better place for it.


I'm sorry that you think I was subscribing to some sort of conspiracy theory in listing those organizations. The point I was trying to make is that what has been termed "fluff" is now "destructive trends", and worse it was somehow suggested that by exposing others behaviors (or rather, revealing stats as illustrated in the above description) would somehow lead to a (final?) solution.

The degree to which the original "problem" is becoming viewed as cancerous is alarming because I feel that News.YC is a place for diverse thinking and sharing interesting content. Digg/Reddit have devolved because they are more homogenous than they are diverse (hence the mob-like mentality).

The very diversity that makes an online community interesting should be embraced instead of eradicated, but that's just my opinion. I know that eventually as News.YC becomes more popular this community will homogenize and follow the same fate as Digg/Reddit.

It is inevitable...


It's not inevitable.

I'm not sure how long you've been lurking; there is a shared desire to perpetuate the level of discourse in a way that benefits the community. We've also got the "don't say something stupid, the natives are very smart / clever / motivated and deserve your respect" factor, the "ask yc news/ask pg and almost always get intelligent actionable feedback from experienced hackers/entrepreneurs" factor, and the "if I have an idea and demonstrate talent I might get noticed / funded, so I better not say something really stupid" factor.

Releasing data is a '?' - if it can be suggested that it may be a means to arrive at some sort of method for automatically/technically ensuring the community doesn't move toward the same kind of mob mentality, that's fantastic, let the heavyweight theory guys find something cool to do with it. If that fails, there's still moderation, and a strong desire to perpetuate the high signal to noise ratio here.


I know that eventually as News.YC becomes more popular this community will homogenize

Suppose you wanted to convince someone who doesn't "know" that. Don't you think the right data could prove your point?


I don't feel the need to convince anybody of that which they do not know. Also, I don't feel that having data of users is more advantageous than proving my point; there are also other pieces of info I can dig up (if you're interested) that may or may not strengthen my argument. I am VERY open to the idea that I might be wrong, which indeed might be the case. However, I will always maintain that what most concerns me about the nature of this discussion is NOT the efforts put forward to solve a perceived problem but the way in which a threat is interpreted and what we are willing to give up in exchange for dealing with the perceived threat.

Btw, I really don't want to come off as being antagonistic here. I'm sticking to my view until I see a benefit to what everybody has been advocating. In the meantime, feel free to downmod me if you disagree. I have no problem paying a penalty for disagreeing, if that is the cost associated with expressing my views, regardless of how well they are accepted.

(thanks for reading this long-ass post if you got this far!!)


In general, Paul, is there a reason not to open up a machine-readable HN platform for people to tinker on top of (other than the time it'll take to code)?


Exposing voting history strikes me as an obviously bad thing to do given that votes have always been anonymous. Even if you tried to scrub personal information from the data, it'd be fairly easy to match anonymous ID's to HN nicks.


Full voting history, probably, yes. Number of votes (some relation to activity, perhaps) not so much.

Anything that's obviously going to make it easy to reverse engineer anonymous things isn't a good idea, but that doesn't mean you can't still find interesting gems.


I agree. Aggregate stats such as whether a small group does most of the voting, whether the same people seem to vote for the highest ranked items, etc. would be pretty interesting while not giving away too much information.

I am also curious whether a completely open system would work. Has that ever been tried? On a system the size of Reddit or Digg it would be intriguing to see how groups cluster.


I'm in favor most of the flag option. I understand how users with a high degree of karma can prove themselves worthy by virtue of what karma even means, but combining flags and the number of times downmodded creates a more level playing field for the process.


The original question aside (for the record, I'd find it interesting as well), does anyone else think this comment thread is kinda nasty? Ugh.


I have the following question for those that are advocating a tech/statistical/mathematical solution: How many of you are individuals who are in the process of creating a startup in the hopes of getting YC funding?

The reason why I'm asking this question is because I had the impression that the articles here are more reflective of the VC/hacker roots of YCombinator (silly me!). I also feel that somehow alot of the up-votes for this perspective is reflective of the notion that if you complain to PG enough, he'll swoop in and fix things so that all is well. Hint: if you continue to have this attitude, then you will see your chances of getting YC funding (or even have them bat an eyelash at your application) decrease rapidly because your objection to "fluff" is so strong; do you really think anybody in their right mind would fund somebody who can't handle the "fluff"?!?!?!?!

I recognize that there are many who have been suggesting solutions that PG implement (and to his great credit he has pleased many of you by expanding the leaders list to 100 now!); but I also want to know if your energies would be better suited in actually finding articles that you like and submitting those (in other words, being competitive), and perhaps even EARNING karma points (which translates into recognition as well).

Not everything is solvable in code and not everything that you do not like is a "problem". I would argue that the diversity (which can be read as the mix of hard-core geekstuff along with fluff) is reflective of the VC/hacker roots of Hacker News. Or in other words: break out of the box and have a multifaceted view of the world.


I'm probably going to regret responding to this, but here goes...

Many of us are also very invested in this site, and are interested in keeping it close to its roots. That's why there's been a lot of alarm regarding the content on the site. I'm sure that some of it will blow over. Some of it will cause changes.

I don't really think that people are trying to get PG to automagically fix the problem for us. I think that we're trying to have an open discussion about solving the Eternal September problem with social sites. This is a recurring problem that occurs on pretty much any site that has a user-contributed component and gets popular.

And, regarding earning karma points: If you take a look at the people frequently involved in the discussion on fluff, trolls and quality slides, I think that you'll find that many of the people that are concerned about this are actually quite active on the leader board. I think it's because we're on the leader board, that we're so invested in keeping Hacker News on the straight and narrow. We've invested a lot of time and effort into the site, and are willing to continue that investment.


First of all: great response!!

"...a lot of alarm regarding the content on the site."

The fact that people are finding "fluff" to be "alarming" is what is concerning me more because this is the attitude that leads to mob mentality very quickly. Everybody is free to think and act as they please. So if this means combatting "fluff", then by all means do that.

On the other hand, I will always be speaking out for diversity. I'm sure we can all agree that great ideas don't grow out of homogeneity and that it requires a mix of information from different sources.

Fear mentalities are much more dangerous than "fluff" because they can seep into other aspects of your life...

Btw, can anybody explain to me why I'm being downmodded so much? I really don't care about the karma itself (except for real-life Karma of course!), but I would like for somebody to explain to me what is wrong with my arguments regarding this emerging "War On Fluff" (sorry, I couldn't resist!!). I will also add that I hope when I ask for feedback on my startup/concept in the coming weeks people are just as sincere about giving me some feedback!

;)


Its because you imply (intentionally or no) that everyone who believes that some form of content control should be implemented has intentions similar to the DOD, CIA, KGB, etc.

In retrospect, ultimate control of the site belongs to pg and if fluff does start flooding the site, he could just reincarnate the site elsewhere with a different policy backend. However, there's no harm in trying to do it right the first time. :)


Interesting....

The reference to that list of covert organizations was to imply the fact that there is a lot of snooping going around into people's activities in real life and that I like the fact that I can hang out with some people online without the need to worry about what other people think my intentions are. Or more directly: I don't like to have to worry about my participation on News.YC being accepted/rejected on criteria other than being voted up or down.

You disagree with me on something? Fine, down vote me or respond, but don't start trying to make predictions on what I might do next. There is nothing friendly about trying to anticipate what my next move might be so that you can identify something "alarming" about what I'm doing. I think the aptly termed "fluff" is harmless, and I've been arguing all along that it contributes to diversity, which is key for idea generation/exchange. Homogenize the community, and it begins to lose value.

I MIGHT submit a series of articles that others think are "fluff", but does this mean that now I go on some sort of "fluff watch list" and need to be monitored, or perhaps "dealt with" because my articles/views/ideas are not in line with the majority members?

Apologies to everybody if I am still coming across as implying some sort of conspiratorial tone (in which case you may down mod me or explain to me where I err in my argument), but I still stand by my conviction that monitoring others to "weed out" certain "alarming trends" is far worse than the pain suffered by "fluff".

I will add that fluff != spam. What others judge to be "fluff" is just that - a judgement call. I'm simply speaking out against judgmental attitudes en masse (or if you prefer the shortened terminology - mobs).

I appreciate the response!!




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: