Hacker News new | past | comments | ask | show | jobs | submit login
A Wave of P.R. Data (niemanlab.org)
39 points by samclemens on Dec 15, 2014 | hide | past | favorite | 8 comments



This is an issue I've been having with the dataisbeautiful subreddit on Reddit. I submit a lot of original charts and data analyses there (http://www.reddit.com/user/minimaxir/submitted/), but the submissions that trend on the subreddit are usually either political or reminiscent of the TIL subreddit but with pretty pictures." Which is annoying, but eh.


At least that community is pretty good at calling out the B.S. in the comments. I can't say that they prevent it from rising to the front page of reddit, but they usually point out the flaws in the assumptions being made.

Sometimes they even go so far as to be pedantic about graph or color choice, but I digress.


In the PR industry, this is called "mediagenic research" and it's been around for decades. The idea is that if you want earned media coverage (what people in marketing/communications call the news), you need to have a story. For times when you don't have a great story, finding some interesting data gives you an excuse to make one up.

Part of the reason it's successful is that people are genuinely interested in it, and will read the articles. They don't particularly care if the methodology used to determine which city has the best sex, or which presidential candidate would perform best in an alien invasion (both real examples) is sound. They just like the story.

Before stories were going viral on the internet, marketers did the same thing through local newspapers and radio. That's why so many studies rank cities or metro areas - because local papers want to report a local story.

Most of this is pretty harmless. Nobody is making public policy decisions based on the data, the people reading it think it's funny or interesting, and it gives time-starved media companies simple content to push out.


I used to work on a team which would collect data for PR purposes for the company. The data is often very heavily skewed towards only what data is available to the company. This is far from a proper sample of the population. For example, instead of '24% of people do x' should be '24% of people that use our site do x'.

Zero statistical methodology is ever used to make sure the data is accurate for the broader population.

Also, just because some data is associated with someone with a PhD, doesn't mean it is any more accurate. Many PhDs and professors are put on part time pay rolls of these companies so as to give the bs data an air of accuracy and authority.


"There are three kinds of lies: lies, damned lies, and statistics."

I love that quote and it's attribution a 19th century British prime minister (1) illustrates how long this "trend" has been around.

So when the author of the article states

"Nobody can say exactly when the trend first started, but in 2014 we saw the first major outbreaks of bogus data distributed by private companies just so it would go viral online"

I think that's a bit hyperbolic.

Old hat, fun to write about though.


Sure, statistics in marketing has been around for hundreds of years. I think his key point was alluded to with "just so it would go viral online" as being the new and growing theme.

The trick is getting the public at large to spread the message, and the goal not being the message itself but rather a higher pagerank and traffic. Before, the statistic itself had to be meaningful, i.e. "users of our weight loss supplement lost 43 pounds in the first month". Now, the statistic just has to be "shocking" but not actually sell anything.


(1) Benjamin Disraeli.


I hoped the post was going to reveal - tada! - an AI algorithm that would detect and filter out these stories.

That would make reading the web a better experience.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: