prefer to rate sites by their "percentage of total Internet traffic"
I can't help but wonder how ridiculous business model like Alexa's works in reality, their source of data is not even scientifically relevant to measure total internet traffic, and yet they manage to get attention from everyone?
I almost feel that I would be laughed at if I had this start up idea and went to raise funding.
It's weird, but as a stats guy, this makes total sense to me. They are sampling a small (hopefully random) group of internet users in order to draw conclusions about how the population at large behaves. It's no different than a political poll. Just like political pollsters, they express their results in relative terms to sidestep the much trickier issue of guessing N, the population size. Hence pollsters do not try to estimate voter turnout, and Alexa does not try to estimate how many people are online.
FWIW, the firms who are giving absolute viewer totals are (I'm pretty sure) just taking this a step further by estimating the total number of people online and multiplying. This is a way harder estimation problem, which explains why the resulting numbers are all over the map. The Alexa data is not any more worthless, just less ambitious.
Its different from political poll because it is not random. Its like looking at internet usage pattern of Indian internet users and predicting the traffic of Reddit, when most of the traffic of reddit comes from USA/EU/Canada.
And that's exactly whats happening, Alexa relies on people who install Alexa toolbar and from what I remember reading most of Alexa users are from South-east asia, or more precisely, from India.
They have rankings by nation in addition to the overalls. Their base is far from a scientific sample, but I think they give better ballpark figures than some give them credit for.
Their "what's hot" section often has stuff about wordpress and the like, so there are obviously many webmasters. There is plenty of American Idol and all that, also. I have no idea, but I'd guess they advertise the toolbar in different places and maybe allow it to be bundled with screensavers or whatever.
Which is fine, but if you have to do that for all the traffic analysis firms you add lots of JavaScript beacons to your page, slowing it down a lot.
They need to start working with companies like Google to use Analytics as a clearing house.
Then you get issues with what and how people are instrumenting. For example, if you instrument a carrousel you'll get a lot of page "loads" which don't necessarily correspond to page loads or ad impressions.
What is most interesting is to see the stats difference when you remove their tracking code from your "quantified" site.
I decided with my business to consolidate on google analytics, so I removed their code. It resulted in a HUGE drop in their traffic reports - about 20%. We went from 440K of unique visitors to 305K just after the changes, and google analytics shows (and our revenue shows) the exact opposite.
By comparison, google tracking code is unfortunately massive. It sets a ton of cookies, which are then sent with everysinglerequest to your server. Wasting bandwidth. It's worse if you have it on subdomains, since then it'll set even more cookies, and send even more crap to your server everysinglerequest.
For a website with a large number of reloads, or ajax or comet, this can be a complete killer.
Quantcast has no asyncronous tracking. If their servers are running slow at all, their javascript will block your site from loading, even if you have it at the bottom.
I know this because it happened to us. We contacted them about it but they have no solution,they don't seem to think it's a problem since "most of the time" their servers are up and quite fast. Marketing requires the tracking, so our solution is to host their javascript on our server and run a cron nightly to redownload the code from their servers in the hopes that we'll never get so far behind that the tracking stops working. So far, so good.
Google, on the other hand, offers an asyncronous, non-blocking solution. Much better.
async tracking helps if the size of the download is large. In that case, a small stub of js is loaded, execution continues, and the large payload is downloaded async.
But the tracking code is small. It's not going to make much difference at all IMHO (And anecdotally in my experience).
Async vs sync, you're still going to have a DNS lookup, and an initial GET unless it's cached. Both can block your site from loading...
Google Analytics doesn't load any stubs for async tracking, it inlines an array of "events to send" and then loads the entire GA script asynchronously. Once it's done, it runs through the list of things to track.
If you let them, they can directly measure the traffic of your website. If you sign up for their service, Quantcast's numbers will be very close to your own Google Analytics.
But, if you don't sign up, they'll still "measure" your traffic and come up with results that are no better than the other web traffic rankers. I don't know why anybody takes sites like these seriously anymore. Everybody I've met who had a website with decent traffic has said that these sites are always way off. I'd imagine that Alexa's main reason for reporting a ridiculous statistic like % of global/country traffic is that it makes it harder for website owners to realize how wrong they are.
And how do they actually measure those numbers without (i) guessing; (ii) having some JS on the page, or (iii) using Voodoo Black Magic.
I think it's ridiculous to believe ANY company that claims that they can measure traffic of a site accurately without the DIRECT INVOLVEMENT of the site's owners. How are they doing it? Through the ether? Come on! Data, people. Data. Google's figured that out, maybe other companies should too.
They get clickstream data from the big ISPs. In theory, if the samples are large enough and their models are appropriately calibrated, this ought to be a reasonable way to do it.
Wild guess: maybe they look at the increase in IP ids? On a lot of TCP/IP stacks they are not randomized, and can be measured from the outside by simply looking at 2 successive datagrams from the server. So, make up a reasonable estimation of how many IP datagrams are sent in an average visit. Make sure to do the same thing on all frontend servers. Congratulations, you now have a reasonably credible bullshit number to chart.
Looks like Reddit's trying to sell brand advertising based off of their comScore numbers, since they're running comScore's direct-measurement JavaScript on their site and they didn't bother to mention comScore in this post.
The point of this blog post is probably to lump the Nielsen figures the agencies are using in with a bunch of services the agencies know are jokes (Alexa? get serious...) That way they can discredit Nielsen when they send this post to the agencies, along with their much-better comScore numbers.
In the comments they said that they started working with comScore about 5 months ago and are still waiting for a report, apparently comScore mangled their first 3 months of data.
I suspect that the vast majority of Redditors use Ad-block or equivalent. Reddit might get millions of page views, but if that doesn;t translate into ad-impressions then they are effectively invisible to the money-people.
Even in the post they once again pander to the privacy freak out crowd by noting that google analytics include some js.
Someone commented that it was nice that they mentioned this, and the author responded with "We know our customers well!"
Sorry but that's ridiculous. Reddit is hard to monetize because it's full of anti consumerist adblock js-blocking privacy freaks.... And they seem to be encouraging it in their users.
Apart from the hardcore privacy crowd you would think they best way to advertise to these people is through text based targeted advertising. You have a mountain of comment data their to run algorithms across to decide which ads are most effective for each user.
That'd work well. As long as you make it seem like they aren't adverts. Some people just hate ads and will not click on them, even if they're the most useful tailored special offer you can imagine.
Look, I didn't want to be blunt for obvious reasons, but now that you ask, I will give you a hint.
The reddit crowd is not exactly advertising friendly. What we got was direct human traffic, and 92% of them were blocking javascript. The only bots we saw were the first 10 hits, most of them familiar from our twitter links; shave 2% for the bots, that still leaves you with 90% of the traffic that's worthless.
The reddit staff are aware of this; why else would they thank you for not using a NoScript or ad-block whenever you visit the site with js enabled?
This logic would also apply to other tech blog and generally tech sites with tech-savvy users. Most of them, including Wired, Ars, Digg, slashdot are doing just fine.
Digg has their own marketing/advertising team, unlike reddit. I think this makes the difference.
Digg is also smart about actually selling and placing ads compared to reddit. I've been checking out digg for the first time in years the last few days, and I was somewhat surprised to find that I actually kind of like the way digg does interstitial ads.
Reddit has one header ad and a couple of sidebar ads that are self-advertising more often than not.
(I will note for the record that this ad[1] spotted earlier today got a good laugh out of me. Somebody has used reddit's self-serve advertising to its fullest potential).
And not all forms of ads are blocked out by AdBlock type plugins/addons by default either, just the most onerous ones and those people can be bothered to manually add.
Even if most Reddit users did block ads, and as noted in another comment it's actually about 30%, reddit has other ways of displaying ads. Their "upcoming stories" box occasionally displays "sponsored links". There are other things that reddit could do to get a sponsor's message to adblock users that would be similarly non-invasive (contests, etc.), too.
Reddit could offer up an analysis of their own server logs to give raw numbers, with some reasoned math to translate that to what they consider unique visitors. Not really independently verifiable, but would be food for thought. (We're already trusting the Google Analytics screenshot to be essentially unmodified.)
It isn't a direct comparison since Google tracks daily actives and their Analytics data posted is monthly, but you can roughly estimate 375,000 uniques/day * 30 days to get roughly 11.2M visitors, which is within reason for the monthly data they present via Analytics.
Note too that though both these sources are from Google, they are from completely separate, ring-fenced portions of Google's corpus of site data. Analytics data isn't used in Trends and vice versa.
The Alexa data implies 1.4 trillion global page views last month (430 million reddit views / .03% global share). Or about 1k monthly pageviews per user for 1.47 bn global users. If Alexa is off by a factor of ten, like some of the other companies are, that number would have to be 10k (too high) or 100 (too low?). Maybe Alexa's the most accurate of the bunch.
these numbers are easily gamed. the trick is to add some invisible iframes that load up an empty page on your domain. you can fire them after the page has loaded so it doesn't slow your site down. I know it works for compete and i assume it works on other services too.
But basically CPM means cost per mille (thousand). I believe 1 cent per pageview is absurdly high for most CPM banner ads, with the real amount being closer to $.003. It all depends on how valuable a demographic you have (or mix of demographics) in the eyes of the advertiser, though.
The blog post is more than just revealing traffic numbers, so I am not sure why you put up a custom Title, also why would you be lazy to click the link? It compares direct traffic to pseudo analytics every non-tech idiot, but potential advertisers and deal makers, seems to rely on.
This is important stuff people. Stop quoting Compete, Quantcast and Alexa.
The degree of suck has nothing to do with the size of the site and everything to do with the propensity of the site's audience to participate in market research programs for a pittance.
It probably isn't, however. Alexa, for example, relies at least partly on the Alexa toolbar to get their figures. I'll venture a guess that most people with the Alexa toolbar installed aren't terribly web savvy, while reddit's audience, for example, are.
You are right, but they are still useful if comparing two sites targeting the same demographic. Reddit vs Techcrunch is OK, so is Facebook vs Twitter. Reddit vs Twitter probably not so much.
If you quantify, your raw pv numbers are free to you and to everyone else. I believe, but am not sure, that the privacy controls are granular enough to allow you to display eg visitors but not pageviews to the world at large.
I call bullshit re: Alexa complaint! We have the same number of uniques (although 2x less pageviews), and Alexa shows much lower rank for our site - about 3,000. Their Alexa numbers are overly favorable (Alexa rank 147).
I almost feel that I would be laughed at if I had this start up idea and went to raise funding.