As a Google user, I value it because it finds things. Some of those things cost me money. To make this visceral: what differentiates Google finding me an expensive camera that I can buy from B&H that matches my search criteria and helps me get my work done of Google finding me an expensive article from the Wall Street Journal that matches my search criteria and helps me get my work done? If Google wants to optimize for "lowest price" then it should make that a non-default criterion as otherwise they are just helping me find cheap, low quality crap: if that is providing you value you probably have a broken definition of "value" :/.
> To make this visceral: what differentiates Google finding me an expensive camera that I can buy from B&H that matches my search criteria and helps me get my work done of Google finding me an expensive article from the Wall Street Journal that matches my search criteria and helps me get my work done?
Likely, what percentage of users come back to search and keep trying other results. From a simplistic point of view, the value proposition to users can be derived what what percentage had their query answered by that source. If WSJ is taking some number of visitors from Google that used to get a page that caused them to stop searching and now providing a page that does not cause them to stop searching, they are providing a less useful resource overall for Google users.
WSJ is providing objectively worse results on average for Google users than they were previously. It makes sense that would cause their ranking to drop.
You nailed it. It's in Google's interest to provide the best user experience and a large part of that is optimizing for the "best" result. Whether or not a user returns to search results, or executes a subsequent search are some of the many signals used to determine this.
It's a great metric for a lot of things, but definitively not to determine the value of a piece of journalistic writing.
Here we're saying that the article SHOULD be free in order to gain value. So basically we're saying: it's ok to attribute no value to it if Google thinks so.
I'm not a fan of hard paywalls, but this is definitively a fallacy in the definition of value, and Google shouldn't have it so easy.
Google isn't determining the value of paywalled articles, the general population is. Google is just facilitating the desires of users. The obstructive thing for Google to do would be to put WSJ near the top because Google thinks it's valuable, even though the users find it useless.
The problem with your assumption there is that both of those things cost money. The benefit ratio for a cheap, but crappy camera to an expensive one might be $250 to $1000, about a 4x ratio, that's a fairly low, finite and often measurable improvement on that camera. That ratio might be acceptable for someone who wants a high quality camera.
With news content it's free vs a few hundred dollars a year, more if you want multiple paid sources, unless you find that all of those free sources including BBC, AP, etc are so bad that they provide negative value and just waste your time then it's in your best interest (infinite ratio) to try to get by with those free sources, ultimately the information from most news articles is by and large equivalent. If Google is able to raise those higher quality free sources to the top of results, that provides much better value to users than suggesting paid sources - most users on finding a paid source will just click the back button and pick another source, the paid source having just wasted their time completely.
> unless you find that all of those free sources including BBC, AP, etc are so bad that they provide negative value and just waste your time then it's in your best interest (infinite ratio) to try to get by with those free sources
So, like the GP said, that's "a broken definition of 'value'" on the face!
Wouldn't you at least concede that if some sources do in fact provide negative value, it might be preferred to connect those sources that do consistently provide value, directly with even some small reward (like a subscription, maybe a per-article cost) in exchange for consistently providing positive value?
How do content creators make a return on their investments, is it meant to be indirectly, through advertising, or is it some other way I haven't thought of that will recoup their costs)
This is all nonsense anyway and I don't believe any of it. Information has to be free, and it's either all free or none of it is. The paywall is wrong, if it seeks to prevent us from sharing information then it will fail, and Google, a part of the system of freedom, is properly set up to route around the damage of censorship inflicted by asserters of copyrights. The paywalls that don't provide free information, moved down or were delisted in the rankings like I think should happen.
If growth in your business model depends on broadening the subscriber-ship and thus reach of your own information, but also on limiting the proliferation of your own information, then it is wrong too. I don't know what this means for news media companies that have to turn a profit for their shareholders; I guess I can safely say their concerns are not my concerns.
Now if you'll excuse me, I'm going to go watch another episode of Black Mirror.
I think that content creators should get paid, but I don't want them to try to put DRM chips in our brains to make sure we're all paying for whatever information we consume.
I don't think it's right that they carve out this valley between their legally afforded copyright protections and my fair-use rights meant to assure I can never use my fair-use rights even after their copyrights are long expired (as if it was even possible for a copyright to expire anymore.)
Yes, I am a little off-topic from WSJ Paywall, but it's all one discussion. How do content creators get paid in my ideal version of reality? At the pleasure of content consumers. What can content creators do when that's demonstrated not to be working? IDK not restrictive digital rights management schemes, tho.
There are no easy answers that would satisfy me as either a content consumer or creator.
Comparing the economics of a pure bit-based product (media) with a pure atom-based product (camera) is not the most convincing way to make an argument in my opinion.
You can get most of what WSJ writes somewhere else for free. You'll probably say that's not true because the value you get from WSJ is not news but their commentary, but even in this case that particular "value" is very subjective and for most people it's not valuable enough.
I for one don't care about whatever elite content they write. And I definitely don't care for a mere website wasting my time by making me click through just to find out I can't read it, over and over again.
That's probably the biggest concern for the WSJ and other news sources that might follow their lead. Most news coverage--excluding content like features, initial exclusives, and commentary--can be considered a substitute good. You'll find it on other sites. Unless you've built a sort of loyalty or trust with a reader, they'll just go elsewhere.
But to build that loyalty, you need to get readers to come to your site in the first place. Policies that harm your performance in search results would seem to be contraindicated.
I don't know if you have heard of "strawman fallacy" but you and the other guy below saying "So you believe that there is such thing as a news report without commentary?" have fallen into that trap.
It's not even funny how you go from my argument to "This makes it not valuable" to "A rejection of the concept of IP with different words".
Here, let me spell it out for you by copy and pasting the same comment:
> You can get most of what WSJ writes somewhere else for free. You'll probably say that's not true because the value you get from WSJ is not news but their commentary, but even in this case that particular "value" is very subjective and for most people it's not valuable enough.
- It is fact that you can get most of what WSJ writes somewhere else for free.
- And I said the "value" is very subjective. and it's not "valuable enough" for most people. Yet you go on and say I said "it's not valuable". Then somehow go from that to accusing me of rejecting the concept of IP. I don't even know where to start.
The strawman fallacy is when you misattribute a position to someone.
"Strawman" not a catchall for "I don't see all the steps in your argument for how my position X implied Y." I'd be glad to spell those out more explicitly.
>...It is fact that you can get most of what WSJ writes somewhere else for free.
It is a fact because people copy it. Therefore, you're saying it's not valuable because people can get it elsewhere because it was copied. So, the copy-ability made it valueless, exactly as I inferred from your argument (rather than misattributing it).
(late edit: Also, the fact that you insist on "atom-based" and "bit-based" being incomparable doesn't help your case that you're not rejecting IP: "Comparing the economics of a pure bit-based product (media) with a pure atom-based product (camera)".)
Okay, perhaps that is not what you meant and therefore it was a strawman -- but that's the only self-consistent, plausible reading I saw.
The only other meaning is that,
"You can get all the interesting facts contained in this story from different sources, without just copy-pasting."
Is that what you meant? If so, it's implausible on its face: why are people trying to circumvent the paywall if they didn't want the WSJ's article specifically? Why can't they just go somewhere else? Why do they load HN discussion with complaints about a paywall rather than "here's the free, independent version that's just as good"?
>And I said the "value" is very subjective. and it's not "valuable enough" for most people. Yet you go on and say I said "it's not valuable". Then somehow go from that to accusing me of rejecting the concept of IP. I don't even know where to start.
I said that because you dismissed the value on the grounds of it being "subjective", which was close enough in this context to saying "oh, I can't quite put a hard value on it, so I don't have to care about this journalism going away". As above, why don't people just find another non-copied source? Because they non-subjectively do want to look at this specific source.
With that said, I do agree that it may not be obvious how your position is tantamount to rejecting IP. But I was deriving that as an implication, not misattributing anything to you. Whether or not you recognize this position as implicitly rejecting of all IP, there is certainly a clear logical chain for how it has such an implication.
Edit: I know we're not supposed to talk about downvotes, could they at least wait the 60 seconds necessary to read this?
> why are people trying to circumvent the paywall if they didn't want the WSJ's article specifically? Why can't they just go somewhere else? Why do they load HN discussion with complaints about a paywall rather than "here's the free, independent version that's just as good"?
Because people come to HN, click on a link, see it's paywalled and leave a comment complaining about it, then move on to a different post. Most people who complain about the paywall likely aren't invested enough in the headline to find another source for the same information.
Personally I mostly skim HN for news. If an article is paywalled and isn't profoundly interesting to me, I can spend the time it would cost me to look up alternate sources for the same story just reading something else instead. I don't read WSJ articles because they're better, I read them because they're there. In fact, I actually prefer other news outlets.
You're misattributing motives to what boils down to mere laziness. And even in doing that you don't actually have a case because many news stories have alternative sources in the HN comments, especially when they're paywalled or too superficial.
So, yes, you're misattributing motives, which is literally how you just defined strawmen.
> If so, it's implausible on its face: why are people trying to circumvent the paywall if they didn't want the WSJ's article specifically? Why can't they just go somewhere else? Why do they load HN discussion with complaints about a paywall rather than "here's the free, independent version that's just as good"?
Social objects. The actual value of a particular article isn't the article itself, but the fact that you and other people in the tread have read the same article. In HN discussions, people will go around paywalls not because they can't find another source, but because they want to read the same source as everyone else.
It's not just that the product itself is easy to copy, it's that the facts themselves are not copywritable, only the writing is. Meaning a source without that restriction provides the value without the cost.
With the exception of more creative editorial pieces, most news falls into this bucket.
You're still saying that the work involved in collecting those facts -- journalism -- is without value, and if no one bothers to do this professionally because their work will just be instacopied, so be it.
You can be the person who just complains about how people don't appreciate journalism, or you can acknowledge the reality and think in more productive ways. The ones who make a difference are generally the ones who think more productively instead of just blaming the people who "don't get it".
If traditional media journalism was truly appreciated, then they wouldn't even have this problem. And honestly they have lost a lot of respect from people in the recent years because they have been doing a lot of things to undermine their own journalistic principles in order to make more money.
The fact that someone prefers to cheap out and read the copy-pasted version doesn't mean they don't appreciate the paywalled version; it just means they want it for free. For all we know, it was high-quality journalism.
Why is it Google's responsibility to administer all this? They have a published algorithm that is intended to maximize fair play on the web. If you don't like that algorithm find a way to buy off the other search engines for paid placement.
If you think that bias is a bad idea than Googles current approach seems okay. They don't bias against WSJ, but they don't bias for WSJ either - The Google bot has less material to work with, so their is less content compared to other sources, so it has a worse ranking. No bias.
They could apply a subtle background (like they do for sponsored results) to results that are behind a paywall, behind a registration-wall, or otherwise inaccessible.
I assume the rule that a site must show searchers what they show google bots derives from that contract with searchers is a key component of the search user experience. I think google have become enough of a monopoly that much of their behavior is negative for users as a whole, but not this. Imagine a search engine where cloaking is allowed...
You hit upon the main difference between the digital economy and the real world economy. People keep trying to equate the two.
Google itself is fueled by online ad revenue. Facebook has ads too, and it's trying to explore every angle (video, etc.) to get people to click.
Both companies know about your likes and interests when you are logged in (Facebook because you declared them, Google because of your past searches) and those of your friends (GMail contacts and GPlus / Hangouts connections in Google's case).
Facebook is more blatant about trying to monetize the ads and still do worse, because people use Google for search, and that's when their intent lines up with actually buying.
Thing is, ads are a commodity in a race to the bottom. You can see every company struggling with this, not just Wall Street Journal but Google. They keep going down. They try new things. Then the ad revenues drop again.
In the real world, you buy a product or a service and you pay for it. This doesn't go down nearly as much in price (although automation and outsourcing may make some impact on wages to locals). If you get paid a commission PER SALE, you've got a sustainable business.
David Heinemeier Hansson constantly talks for years about how you should be building $AAS and charging for it, instead of making everything free and slapping ads on it.
Another acquaintance of mine, Albert Wenger from USV has a whole book about how digital goods are essentially free to perfectly copy, leading to a post-scarcity world in digital goods, with only artificial constraints such as copyright enforcement propping up their price. He has a book out called worldaftercapital.com where he describes humanity's progression from the hunter gatherer societies through today, and explores what the economy will look like later.
So, if you're in the business of creating content, you're now competing in a GLOBAL market. WSJ and other publishers got to enjoy the distribution of their newspapers for decades with copyright protections, as did the music industry. But at the end of the day, when the means of copying information make it cheap to free, you've got to figure out other business models. The freemium one is a good trade-off. But really, why would I pay for WSJ after my free 3 articles a month are up? When there are so many other outlets reporting the same things? Because I like or want to support the WSJ specifically. Their brand. I did pay for a NYT subscription after all.
You know what I really value far more than WSJ and NYT? Wikipedia.
And how much does it cost to run it? Not much (aside from hosting) because people collectively build on each other's work and editors check new contributions to achieve a result that is quite good. People cooperate more than they compete, because the rules of the game are such.
I see such things as being superior to capitalist competition. Look at open source (Linux, Firefox) vs closed source (Windows, IE). The former runs on toasters, the latter doesn't and still over time is less stable. The former overtakes it in quality.
Why not as a society embrace WikiNews and other such sites? Why do we even need the old model of journalism? Because it will appeal to certain niches, just like newspapers used to circulate to small communities and not sold all over the world the next day. Now they'll be sold all over the world but their readership will shrink.
The information economy is and has always been. different. Digital ads for digital websites are far different than commission sales of real world items. 3D printing may one day bring the two worlds closer in some areas, but we are still all a long way away from that.
Because information wants to be free and you can't copy a camera? Surely most people would not just buy an expensive camera from the place it costs most, even though the service might be better? You would probably compare prices and unless you have a strong alternative argument (moral?), you'd go for one of the cheaper places.
I'm not sure we should put the morality in Google's hands. If you want top journalism, you ought to pay for it, but it makes sense search engines provide the quickest path to information, and that's usually not a paywall.
Very well said. I have this image of an alternate universe where people are whining that when they google for that camera, it should give them the name a fence (as in, seller of stolen goods) that offers home delivery.
The whole point of a search engine is to make O(1) queries that iterate through O(n) sources; if I have to individually query all O(n) sources, that defeats the purpose.
That's great if you want to pay for good journalism. That doesn't excuse publishers from showing you one page if you are Google, and another if you are a non-logged-in user.
If publishers want to attract people to get them to subscribe, they need to find other ways that don't violate Google's search policies that have been in place since seemingly forever around cloaking[1].
It would be nice to have that option. But such a small percentage of the web as a whole chooses to pay for such journalism that it is likely hard to justify implementing that feature. You can search WSJ.com by using the site:wsj.com argument after your keywords, but that isn't really what you're asking for here.
Generally speaking, paid journalism is not a sustainable business model for most entities. WSJ might be an exception, but they won't be getting any special Google treatment, nor should they.
Personally, I believe that the business model of most journalism sites will switch to a combination of sponsored content and the sale of Facebook and Google custom audiences. With retargeting, it's possible for an adveriser, say Microsoft, to tell a site like WSJ "we want to be able to advertise on Facebook/Google to people who read to the bottom of this article you wrote about cloud services" and pay WSJ for being able to use that custom audience. This kind of retargeting is already possible on Facebook and Google, but currently only limited to people that have visited your own site(s). Having a custom audience marketplace would be amazing for many advertisers and deliver badly needed revenue to publications.
Because a) google is better at indexing a given source, and b) every user shouldn't have to build their own search aggregator across the n sites they have privileged access to; that's the role of a search engine (except for it covering all sources not just a few).
Maybe there's an opportunity for Google to partner with subscription sites to index their content for users who own a subscription -- but what should Google do when there's a ~0% chance a site contains useful information for a non-subscriber?
Because Google can only index what they can see. As noted in the article, the Googlebot only gets to see the first few paragraphs of WSJ articles, so those pages are less likely to rank on searches.
Indexing more of the content (which would be possible by providing the full content to web crawlers) seems to violate Google's cloaking guideline as well: https://support.google.com/webmasters/answer/66355
It's not excluded. They present less content to Google users and are ranked based on the actual presented content.
If News Corp wants search ads for it's paid service, it can by then like anyone else instead of expecting Google to treat them specially, which is what they are really asking for.
Probably not; unfortunately, we don't seem to have a good digital marketplace for journalism.
That could be interesting, though; a storefront with a model somewhere between Steam/Amazon and Netflix/Spotify. Somewhere to both collate the offerings which you can purchase, and highlight content from those outfits at the same time.
Ultimately I don't think a market exists for it anymore. Information like that is just assumed to be free on the internet, I don't particularly care whether it's AP, WSJ, BBC or CNN as long as it's reasonably informative I'd rather just read a few articles from different source on topics that interest me than pay a single penny for content. As long as free sources of reasonably good content exist, they'll continue to dominate the market.
The market is properly weighted by dollars, not by number of people. A 100,000 people each willing to pay $200 a year for a subscription is a larger part of the market than 5 million people that will collectively click on 20 million ads in a year that generate an average of $0.001 for each click.
If people are willing to pay you a premium, you don't need to capture "a majority of the news consuming market", you just need enough of those people to be profitable. Ben Thompson at Stratechery says he has a little over 2,000 paying subscribers- a tiny fraction of paying WSJ subscribers, never mind the people reading free content- and he's doing fine.
In fact, that's a known winning strategy in a lot of industries, not just journalism: let the suckers rip each other to pieces in the race to the bottom, while you deliver enough value to have fat margins. That's what Apple does, for example.
I think it is relevant, the WSJ has a lot of digital subscribers and they're profitable I think in large part due to their paywall. FT is also profitable and employs aggressive paywalling. I've noticed smaller local publications tightening up their paywalls, as well.
I'd expect to see more paywalls going forward, not fewer. I think there's increasing recognition that the economics are better.
I wish I could use OAuth or similar to give Google permission to index the papers and magazines I subscribe to. If I haven't OAuth'd and the paper requires a login, rank it lower. If I have, rank it higher. Google always talks about wanting to personalize the search experience. This seems like a good way to accomplish it.
The answer is yes. If everyone went to this site with the intention of paying for good journalism, Google would rank it higher. Now it might penalize the site if the first thing that it displays is something that looks like an interstitial popup rather than content, based on the assumption that this type of experience is a poor-performing feature on other sites.
Google is built to optimize relevance against what people are searching for at a heuristic level (increasing the utility of their search engine based on each immediate choice people make, as opposed to a model like Facebook that tries to increase overall relevance of experience to get more time spent).
Most people who end up on WSJ are searching for quick, accurate, free information. The landing page (a full article) provided that, albeit in an unsustainable business model.
The vast majority of internet users are not looking to subscribe — which has become the main function of the landing page now. That means that the site is, on average, less relevant at a heuristic level.
It would be fair if WSJ would be asking for that. I agree, that if I already pay for a subscription, the results should be ranked accordingly for me. There is probably a simple, technical way to implement that.
But AIUI this is not what WSJ is requesting here. They want free ads for their product ranked high in the search results.
I subscribe to The Economist and WSJ and have pretty much stopped visiting Google News, except when there is important breaking news. Google News really isn't for someone who subscribes to news journals. Apple News is much better for this purpose.
Coincidentally I just ended my WSJ subscription because they were publishing fewer and fewer articles. It was down to maybe 5-6 a day that were even worth reading.
I highly recommend you try the FT if you are a former WSJ user. Great depth of content updated throughout the day via thier website and their app. I have been a subscriber for ~12 years or so.
I use Blendle which shows me excerpts of articles from Wall Street Journal, The Econmist, NY Times, etc., etc. and for a small fee I can read any article with no advertisement. I really like the service and I like pay-as-you-go things on the web so I can support stuff I find interesting.
Sure if google and certain users are willing to give up their history and privacy for it. I doubt many people would like that. I wouldn't like giving up my privacy to this severe degree.
WSJ is slanted I dunno recently but I feel like anything Murdoc kinda ruined it.
If you want to pay for good journalism, then rankings WSJ higher hasn't made a since several years before the News Corp. takeover.
More to the point, though, if you have an affinity for WSJ and similar content, yes, Google will probably pick that up over time, though getting a big enough boost to outweigh the cloaking penalty completely may be difficult.
OTOH, if the WSJ is useful to.you even with a hard paywalled—that is, if you are a paying subscriber—you'll presumably have it bookmarked and it will be on of your go to direcrt sources for news; you won't need discovery through Google to find content there very often.
I don't know about Google search being as great and flawless as you're implying here. I run a website which is dedicated to one particular topic, and Google won't list it at all ("no more results after like 198 hits" of lots of outdated and superficial content), even though the site is registered via Google search console. On other topics, I get pointless "123000 matches in 0.23 ms" results. Not impressed at all; a naive keyword count as rank criterion would do better
- There are other websites with way more authority talking about SGML. (Like the W3C website).
- You have no/little sites or forums linking to your content.
- Your page titles are uninformative. Biggest offender is probably the homepage, with a page title of "index". But even for your reference page, it it is just "Syntax Reference" (way too general), and Google actually uses your page headings to repair this to "SGML Syntax Reference". Try inverse breadcrumb style "SGML Syntax Reference | Docs | SGML.js". BTW: you ranked 2nd for "SGML Syntax Reference".
- Suspicion: Content not visible (like those in the content slider) is ranked lower than always visible content. Chrome headless crawler can detect this. Add this slider content as regular text to your homepage, and also try to expand content there. Include links to your latest blogs.
- I prefer hierarchical headings, not just sections and <h1> for everything. This, because hierarchical headings can not hurt, but non-hierarchical headings could hurt.
- Finally, SGML being a standard, there are simply a lot of competitors for this keyword. These competitors are not commercial competitors, but authoritative websites with lots of informative content. Exactly the sites that Google likes to rank high. If you want to rank for SGML, you may be fighting an uphill battle.
I'm aware of some of the issues you mentioned, but don't you think my site, with the depth of information provided, deserves at least a mention among the other ~200 ones? I'll try and fix the heading issues first, then see if search results improve.
Your site shows up fine if I search for SGML Syntax Reference or SGML syntax. Meanwhile, it doesn't appear to generally be about learning basic SGML concepts, so it seems quite reasonable that it doesn't show up when just searching for SGML. And why blame Google when Bing does the same?
It seems like your complaint is that adding one page of reference info wasn't enough to serve as an ad for your business. It doesn't seem like a valid complaint to me.
While your point still stands, don't you think there's things you could do to help your site index better on that topic? It has no meta data like keywords and descriptions, no robots.txt/sitemap.xml, the links to the /docs pages are hidden under an overlay that requires JavaScript to show, that URL has 10 different <H1> tags, etc.
Googles job IMHO is to find relevant information, not judge it based on costs. How often do we get second hand news where an article is simply parroting what another site wrote? I'd rather get information as close to the source as possible in most cases. Perhaps that means paying for it, but that choice should be mine to make. Put the most relevant at the top and if people want to read a free second hand ripoff of some news they can select a lower ranked search result.
Now how would google find the relevant info (and index it) yet at the same time a 'regular' user would be restricted by the paywall? That might be a solution you can implement technically, but will it also work for your users?
If users don't have access to info because it is restricted to paying users, google won't have access either.
It is technically possible via web cloaking. It is the same technology that tricked GoogleBot think you are a legitimate content site and get served pharma ads for big blue pills.
I guess what this one infact states is that what they were doing was okay.
In the video the guy explains that while you are not allowed to treat gbots request in a way no other people are treated, it is okay to differentiate between "boxes" of users. In their example it is the country USA, but if you define a "google user country" and let all users of google in it is ok to bundle the gbot with those. Grey area for sure, but might makes right.
"So geolocation, that is, looking at the IP address and reacting to that-- is totally, fine, as long as you're not reacting specifically to the IP of just Googlebot, just that very narrow range".
Also, they will crawl you from an unusual IP using a user-agent that doesn't say it's Google. And when that happens, and you deny access to undercover-Googlebot, but allow Googlebot in full uniform, you'll be penalized for cloaking.
> but gbot and anyone coming from the google homepage.
This how they were originally handling it, before February. It would display if you visit the link from Google, or set your refer to it looks like it (this is why HN has the "web" link under articles), even if you weren't a subscriber. It's allowed, because regular users coming from Google do see the same thing as Googlebot.
WSJ have since changed that, so only subscribers can view articles, and you no longer get a "free click" coming from Google Search as Google calls it. They now show a short snippet, and are following guidelines to be labeled a “subscription” service by Google Search. This caused their rankings to drop below being a "free" news source though. But it's not nearly as bad as if they had cloaked Google.
In the video, he stresses pretty hard that all forms of cloaking are disallowed, even if not malicious or deceiving. Unless the Googlebot has a paid subscription to WSJ that's still cloaking, as you're showing Googlebot a different page than a regular user.
Google's rules and help documents are spread all over, but here's some from Google News about subscriptions:
"If you prefer this option, please display a snippet of your article that is at least 80 words long and includes either an excerpt or a summary of the specific article. Since we do not permit "cloaking" -- the practice of showing Googlebot a full version of your article while showing users the subscription or registration version -- we will only crawl and display your content based on the article snippets you provide."
The whole purpose of making a webcrawler for a search engine is that you are crawling the content that a user will see upon clicking a link in search results.
Googlebot does not crawl the web separately for each user with a copy of that user's credentials borrowed from their browser. Aside from the privacy and security issues, it would require Google to multiply it's search resources by the number of users.
That would require Google to be in the business of knowing every site that you have a membership in.
It is not possible for Google to get access to cookies for other sites, anyway. This is a pretty fundamentally important security restriction that browsers implement to protect you from nefarious sites. So it isn't possible for Google to know which sites you have paid accounts with unless you explicitly tell it.
> Even if it's behind a paywall, there is benefit in crawling it.
How would that work? Would Google create or be given a login for WSJ? That would benefit WSJ as an incumbent news provider at the expense of startups too new or small to get special treatment from Google.
How would a new website indicate to all search crawlers (so google doesn't befit at the expense of other search engines) how to get access to it's pay-for content in such a way that end-users can not also pretend to be seach crawlers and get access to the same content?
Possible, but seems unlikely. To set that up the WSJ website would have to allow Googlebot access while denying others. Any filtering based on the url or HTTP headers would be discovered and abused by others. An approach based on a security token or IP filter could work, but would be un-managable on Google's side because of the scale of their spidering operation. It would be much more effective for them to use their position to force the WSJ to be an open website, or to accept that their paywalled content does not get indexed.
It might start out as a tiny percentage, but all it takes is one person setting it up and letting the world know about it. Then pretty soon the WSJ is faced with millions of people getting free content again. They're complaining about people accessing their articles via a Google search and then clearing cookies to reset their counters. That's hardly mainstream; browsers have been burying the clear-cookies functionality deeper and deeper over the years because it's seen as an advanced-user-only kind of thing. And yet the WSJ has millions of people doing it, enough to make an impact on their bottom line.
Google doesn't post a public list of IP addresses for webmasters to whitelist. This is because these IP address ranges can change, causing problems for any webmasters who have hard-coded them, so you must run a DNS lookup as described next.
DNS lookups are far more expensive to perform than an IP filter, and couldn't be done in realtime. So WSJ would have to set up a system where they regularly find all Googlebot referers in their logs that were rejected, do DNS lookups on the IPs, and add any that were valid to a whitelist so that they won't get rejected again. This will cause new Googlebot IPs to get rejected until the whitelist is updated, hurting indexing and ranking. The WSJ would also have to go through their whitelist regularly and do DNS lookups to verify that all of those IPs are still valid Googlebot IPs, and remove any that aren't valid anymore. That opens a window for invalid IPs to continue to get access, which may or may not be a problem depending on how often IPs change and where they get reassigned to.
The IP whitelist would need to be distributed to WSJ's webserver farms and used to update firewall rules, in an automated way that may or may not integrate with how that stuff is currently managed. (Generally, those rules would be tightly controlled in a big org like the WSJ.) The HTTP access log gathering from the farms and their analysis would also need to be automated, which again might be a management issue if the logs contain anything sensitive. (Like, I don't know, records of particular individuals reading particular stories which certain government agencies might be interested in acquiring without the hassle of a warrant.)
So yeah, there's a way to find out if an IP belongs to Googlebot. That's a long way from a manageable filtering solution at the WSJ's scale, even if Google wouldn't penalize them for doing it, which they would.
There used to be a lot of sites that did that, and the result was that people would set their user-agents to match google's and get all the paywalled content for free.
Not sure why you're being downvoted... It says in the article:
After the Journal’s free articles went behind a paywall, Google’s bot only saw the first few paragraphs and started ranking them lower, limiting the Journal’s viewership.
I find the turn of phrase interesting: "Google's" users.
Yes, of course technically that is an accurate description.
But although I am a user of Google, I don't like the idea of them thinking of me as their asset, even though I obviously am.
And irrationally because Google feels like such an omni-present utility, my intuitive expectation is that they index the web in a non preferential way.
Which still doesn't make any sense, because the entire utility of their search results is because they weight what they index.
There's some cognitive dissonance here, but I can't put my finger on exactly what it is.
The possessive in English doesn't necessarily indicate actual or presumed ownership. It can also indicate looser affiliations and even relationships where the subject is not the controlling party. It's my shovel and my horse, sure, but it's also my country, my boss, and my God.
How do you plan on balancing this overly strict definition of value with a sustainable model of funding (critically necessary) investigative journalism? All this helps is the proliferation of ideology-reinforcing propaganda and clickbait. Are we going to shut out real, substantial content from the online space?
The commoditization of search can't come soon enough. Google shouldn't be able to monopolize this space.
That's WSJ's job to figure out, not search providers. From google's perspective, a search which leads users to a page that doesn't load, which the user immediately abandons, is a bad result. WSJ isn't entitle to traffic because they're an old organization.
It's not about "entitlement" it's about the structural incentives put in place by the online media playing field.
To go out in the streets and talk to people, dig deep into obfuscated government archives, and actually make sense of the world takes real work. It's often work that results in content that isn't always just what their readers want to see.
None of that influences the way Google's spiders see their site, bad results are bad results. WSJ could leave their content freely available and sell banner ads, vs. locking themselves into the subscription model they're familiar with. It's not Google's fault WSJ doesn't want to change.
Or they could seek patronage without a paywall, like the Guardian. If they want subscriber exclusivity, they shouldn't be surprised that search engines serving people that are mostly not WSJ subscribers rank their content based on what is visible to non-subscribers.
> How do you plan on balancing this overly strict definition of value with a sustainable model of funding (critically necessary) investigative journalism?
Allow people interested in paying for content to pay for content, and then use the discovery mechanisms the content owner makes availlable for that content.
If the content owner wants to promote content that isn't available to the general public via search engines, they can buy search ads like anyone else.
They don't need a subsidy at Google's (and, in terms of time, Google users') expense by way of organic search results for content that most search users will not be able to access when they click the result.
> The commoditization of search can't come soon enough. Google shouldn't be able to monopolize this space.
Commoditization of search (which means heavy competition focussed on minimizing costs) isn't going to make it any more likely that competing public search providers are going to subsidize paywalled content that their users can't use.
So the implication then, is that real & substantial content comes from those who get paid, and ideology-reinforcing propaganda and clickbait come from those who don't?
Are you sure it's true?
Are you sure it's not the opposite?
Could either or both be true?
Substantial content takes more work to produce then clickbait.
We must critically examine the structural incentives we put in place that support shallow outrage porn over deep, well-sourced, investigative journalism.
The problem is that is false. Ads (and the click bait they encourage) certainly is a monetization strategy; but to suggest it's either free rubish or paywalled quality is patently false. Ignoring completely state sponsored media (NPR, BBC, etc), groups like News Deeply [0] offer proof by contradiction to your assumption.
Killing search isn't going to add more "real", "substantial" content online.
NPR isn't anywhere close to "completely state sponsored media"; NPR gets basically no direct government funding, what it does get is indirect through its member stations, who get less from government than from corporate sponsors, and less from sponsors than listener donations.
Please don't frame your response as a logical argument while ignoring the ambiguity introduced by your use of "that" to refer to my entire statement.
Without good search, there's a discovery problem. How will people find quality content easily? Where'd you get "killing search?" Search clearly provides value.
If the monetization strategy is, as WSJ, to make their content inaccessible, it is them shutting themselves out of online spaces, not Google. Walled gardens break the web.
Because every Google user lacks a WSJ subscription and will only be satisfied with news articles they can read immediately for free?
From the fact that I'm searching for e.g. a black & white old Western movie, it doesn't follow that I only want movies I can view on the internet right this second. Edit: I could very well be satisfied with "here are the names of popular ones, but you'd have to go to their publishers since they're long out of print.
Which is the reason Google still shows WSJ in its search results. So, you can decide to still go there even though it is a subscription website. But Google only works with things its bot can see. What you suggest amounts to boosting WSJs search position despite only seeing a small part of their content, because "we know that they are good". Doesn't sound like a great idea.
If you click a search result and end up seeing something completely different than what you expected based on the search result snippet, it shouldn't matter if you're the WSJ or a scam site trying to hack your Google rank. It's deceiving the user and inflating your search ranking at the expense of more deserving listings.
This. I installed the personal blacklist extension for Chrome just to blacklist Quora results because of this. I wish Google would actually punish them more. I click through on something which looks relevant, and the result is a useless login screen. LinkedIn is the same.
If you change your browser's UserAgent string to Googlebot, then your client will be treated as a first-class citizen, by many of these sites. Google always wins, so let's all be Google.
It's extremely rare to be ip-blocked by any website just for using the Google's user agent from a non-specific range. IP's get re-used and you can switch to a new one easily, so it's really not common or good practice for this to happen.
> IP's get re-used and you can switch to a new one easily, so it's really not common or good practice for this to happen.
On the flip side, some people can't change their IP addresses easily, and getting IP banned (even if rare because of the reasons you stated) is actually a major hassle when it actually happens for those people. :/
Is that really a thing? That must be such a hazard for their developers. I usually have a test for sites that I work on, that scrapes a few URLs as Googlebot, to verify that they are getting an optimized view (no JS, structural-only css).
Yes. Googlebot only crawls from legit addresses (even when their developers are trying new things) so it's an easy scraper/scammer signal to key off of.
No. It allows Google bots to see full articles, but shows only the first paragraph or so to non-subscribers. Even if they're coming from Google search results.
However, I don't see cache links on Google :(
Edit: Oops, I'm wrong. The article does say that the Google bot only sees the first paragraph or so.
"The reason: Google search results are based on an algorithm that scans the internet for free content. After the Journal’s free articles went behind a paywall, Google’s bot only saw the first few paragraphs and started ranking them lower, limiting the Journal’s viewership."
I call that enacting the nuclear option. It's almost guaranteed to win the war with ad-tech! It should be enacted for sites with run-away ad engines that spin up your CPU fans and make scrolling laggy.
Of course, the problem with nuclear is collateral damage. Drop the bomb and ads don't work, but neither does a lot of other stuff. E.g., the site shows a blank screen, images are invisible or blurry, drop-down menus don't drop. And, of course, the deal-breaker: videos don't play.
The remedy for killing JavaScript is more JavaScript (and CSS). But supplied inside a Chrome extension targeted at the offending site. An injected stylesheet makes `<body>` visible again, hides assorted useless junk, and styles injected UI elements. Your content scripts load the missing images, drop the menus down, and play the unplayable videos in button-activated pop-over windows displayed at superior resolution.
Of course, the problem is, there are a lot of sites out there, and they change unpredictably, requiring your extension library to change in response. That argues for crowd-sourcing the extension library, but the crowd needs to be proficient in HTML, JavaScript, and CSS and know the ins and outs of browser extensions and care and have time.
You can completely change how a site presents. E.g., change a slide-show in a static slide window that barely moves due to the background ad-tech load changes into a set of `divs` that roll upwards as your finger swipes.
It's a hobby at best. Disabling ad-tech components by origin is the practical option.
Call me Dr. Strangelove, then. I usually browse with JS off, enabling it on occasion. And there are some whitelisted sites.
I used to play around with filtering sites to make them less antisocial, but find that slog less entertaining these days. So now when confronted with a site that's useless without JS, eh, there's almost always another site out there that doesn't mind the terms I demand for my attention.
Do you need an account to actually make Pinterest useful?
Sometimes I do a Google Image search because I found something interesting, but don't know what it is, so I'm hoping of going to a page that describes what I was looking at. Pinterest shows up as a result, but with no backstory nor does it lead me back to a source, so it's worthless as a search result. It'd the ExpertsExchange of image searches.
Endy's advice of using `-inurl:pinterest` seems invaluable and I'll be adding that for all image searches in the future.
Would kill for a Google Image Search extension that forwards my browser to the original source page rather than the Pinterest landing page. Any recommendations?
Frankly I'd love Google to turn Google Images into much more of a true Pinterest competitor. Image search volume is pretty large, and they are already testing monetizing it. Now they just need to add in features that let you store and organize that information (and build a better profile of you in the process while also crowdsourcing tagging).
I certainly recognize the irony of suggesting that Google might help improve competition.
That said, there aren't many players in a position to provide actual competition to Pinterest like Google can. The obvious concern is where you draw the line at anti-competitive if Google tried to do the equivalent of what they did with Yelp ratings.
Bing is a lot better than Google for searching images. It also has better tools built in to the results page (similar images, different sizes, multiple source pages).
Could you PLEASE PLEASE PLEASE link to this extension so I can install it to, I used to have one but lost it and couldn't find one that worked last time I checked. (Would love one for both Chrome and FF if it isn't cross platform).
Would also love to have one that rewrites URL's in search results to avoid the frequent 5 second pause when Google's redirector gets it's head stuck up its ass or whatever the problem is.
Note that it doesn't block some spam domains that sneakily use certain special characters i their domain names. Unfortunately Google hasn't fixed this issue for forever.
Quora is one of the biggest offenders of growth tactics at the cost of user experience. They make it so easy to create an account without intending to that I probably have 15 of them.
This 2013 article sums up nicely why quora is frustrating to be directed to while seeking enlightenment, and as others have pointed out they use some unpleasant anti-patterns to lure in (potential) users.
You're right, this is wrong in our current paradigm. But our current paradigm is wrong, too.
I want a world that supports business models other than web advertising. It negatively impacts journalistic integrity and freedom while further exacerbating the race to the bottom search engines and other content aggregators create.
WSJ is responding poorly to a bad situation. I suspect it'll cost them.
Out of curiosity, do you think journalistic integrity was also impacted when newspapers used traditional paper advertisements as their source of revenue?
Eg, have newspapers _ever_ had integrity, and if they _did_, what's different?
> Eg, have newspapers _ever_ had integrity, and if they _did_, what's different?
They did somewhat before the massive corporate consolidation starting in, IIRC, the late 1970s when newsrooms started getting axes and the major dailies progressively became skins over wire services and lightly-rewritten press releases.
The internet often gets the blame, but it providing actual competition was decades after the terminal quality and subscribership decline of American newspapers began.
It actually was the internet competition (both wire services being directly available to readers and the loss of advertising) that actually got some of them talking about building up newsrooms, rebooting investigative journalism, and relying more on subscription income (paid subscription was never paying the bills before, it was pursued as a key metric advertisers used in determining how much it was worth to advertise in a paper.)
That's a very interesting thing to try to quantify. I would say they had _more_ integrity, but risks always existed. Reuters used to have a corporate structure that would prevent it from capture, and even that didn't guarantee impartiality.
Multiple revenue streams (sales, classifieds) would make them less beholden to advertisers.
Of course they would still have to write material that sold!
I don't like to use John Oliver as a source, but there's some decent content in this:
This actually sounds like a really interesting/profitable problem to solve. I'm sure there are tons of sites that want to be pay sites, but also want articles to show up in google search results. But users don't want to get inaccessible results. Google (or whomever) needs some solution to handle non-free results intelligently. Allow users to filter out non-free results, to configure which journals they do have subscriptions to, etc. Even an easy way to make micropayments.
If Steve Jobs was still alive, I'd bet Apple would be working on a competing search engine with some of these features.
I agree. Advertising is not a great business model on its own, particularly with the proliferation of adblockers. If there was a way to get people supporting more paid content then IMO the quality of content could also improve. But who knows how all this will go in the end. There's always been a lot of high quality free content on the Internet as well, just because people are willing to share their knowledge with each other.
krschultz's comment is really relevant as well.[1] In a complete system, search should actually know about what you subscribe to already, and not penalise those results for you.
Again, Steve Jobs. Under Steve Jobs, I think Apple would figure out a way to make the user experience terrific, without the end game of "Apple hordes every piece of your existence" like Google.
Steve Jobs reinvented PCs, reinvented mobile. I think "the next Steve Jobs" could do the same thing for search. I'm less and less certain of Google's monopoly on that space going forward. It's still built around Web 1.0 tech, has hacks into Web 2.0, but there's a Web 3.0 it's not ready for.
It's probably inevitable. TV was free airwaves for so long. Now it's cable subscriptions. (And there's still advertising). Internet is already controlled by your ISP, so I have a hard time seeing that being "open" for much longer. Browsers, content providers, etc will all be regulated / commercialized / sandboxed.
Free zero-revenue startup idea: there'll be an IP-over-ham-radio or something to preserve "internet classic". (Largest use will be bitcoin-for-pornography).
I wish I could just subscribe to news like I subscribe to Hulu. I'd happily pay $15/month ($22 ad-free) and then the papers I get access to can figure a fair way to divy it up.
There's got to be a startup idea in there somewhere.
I'd happily pay, but not a flat fee. I'd maintain an account with some third party, with ~$50 balance. Browsing ad-driven sites, the agent would bid whatever it took to get all ad slots, with a limit of $0.10 or whatever. Over that, I'd get a quote, and could accept or decline. Browsing paywalled sites, I'd just get the quote.
So the company that needs this is google? I doubt they care whether WSJ paywall results are missing or not. Content in many industries is a commodity and plentiful
If the future involves a domino effect where major content providers start falling WSJ's lead, then yes, it's an indication that the current model isn't working and Google needs to either evolve or be replaced.
They are known to crawl using human-like user agents instead of the typical Googlebot one precisely to counter this (weak) effort at playing the system. I'm surprised WSJ is surprised by the outcome here.
> Webspam pages try to get better placement in Google's search results by using various tricks such as hidden text, doorway pages, cloaking, or sneaky redirects. These techniques attempt to compromise the quality of our results and degrade the search experience for everyone.
Google does not penalize Facebook for web spam. Facebook, to my knowledge, does not do:
- hidden text,
- doorway pages,
- cloaking, or
- sneaky redirects
They just show a big popover nagging you to log-in. But you can click this away.
If certain Facebook content pages rank low, or do not rank at all, it is because Facebook actively blocks Googlebot from accessing the content, not because Facebook is trying to deceive Google (or the user).
Though Facebook does not need Google, it could get quite a lot more visitors if it lowered the wall of its garden a bit. As is, Facebook is an inaccessible social echo chamber, and I don't lose any sleep over this.
But many Google users do see the content. I wish Google knew that I paid for subscriptions to WaPo, WSJ, & NYT and ranked things accordingly. I never want to open an ad supported story when the same thing is covered by a publisher I subscribe to.
I think it's not far fetched to imagine someone with a paid subscription is subscribing to use it habitually as opposed to just having access whenever they clicked on a link via google.
Right but lets say I search for an old topic (not current news). It won't be on the home page of the WSJ, NYTimes, or WaPo. But I'd like Google to surface that for me since I'm a subscriber and pay to have access to journalism like that.
100% agree. I've actually reported WSJ to Google several times for cloaking (that's what Google calls this bait-and-switch). The penalty for this is supposed to be deindexing of the entire site from all Google results, but apparently WSJ gets a pass on that. At least they're being punished to some degree.
For the record, if anybody needs a draggable WSJ paywall bypass bookmarklet, I put one up here:
Since WSJ objectively is a scam site in the relevant dimension, offering different cotbent to Googlebot than to search users, it absolutely should be treated exactly like other scam sites.
> Different content to some search users. All WSJ subscribers see the same thing as the Googlebot.
WSJ subscribers, online and print combined, are in the low single digit millions. Google search monthly unique users are about three orders of magnitude greater. WSJ online subscribers are close enough to 0% of Google's users as to make no difference.
The vast majority of google searches are certainly not all searching for things that would lead them to WSJ.
If we take just the population of the US in 2017 (326.5M), and assume every American searches via Google for their news, we're looking at give or take 1% of the US with the WSJ subscriber estimate you provided (~3M).
We can refine these numbers further...
19.4% of the US population is less than or equal to 14 years of age (18 would be better, but couldn't find) - so that gives 263M potential American news readers
That leads to at least 3% of US Google news searchers are potential WSJ subscribers.
So that's not a tiny number (yes the number could be adjusted for worldwide English speaking news googlers - but I think I've made my point).
How many of these subscribers are wealthy and coveted by advertisers?
What if more news publishers follow WSJ and you happen to be a subscriber of that content?
With the amount of information that Google has on its users, I don't see why it can't adjust search results based on whether or not you subscribe - and bring value to whichever side of the paywall you reside on.
I think this is not what WSJ is after. I think they are trying to pressure Google to find a more suitable policy (suitable to WSJ) by putting Google users behind paywall while allowing free visits from social media. They are probably laying the groundwork for future battles, sorry, I meant talks.
There's an interesting difference between overlay paywalls like Wired uses, and content-not-loaded walls like WSJ uses. In the Wired case, they text is sent to you but they try to stop you from reading it. In the WSJ case, they don't even send you the text of the page you supposedly clicked on.
Since we're in the second case, this isn't even a decision by Google. The WSJ actually isn't sending you the data in the search result snippet, so the crawler rightly says "wow, nothing useful here". The complex, ideal solution might let me tell Google "search as though I'm a WSJ member", but short of that they're accurately assessing what content is actually available.
They'd probably be happy to do that, but my understanding is that Google generally frowns strongly upon sites that display different content (or no content) to users when they tell Google those pages have content.
I could imagine Google adding some sort of "content is locked behind a paywall" indicator on search results, but if I'm searching for something on the web, a link to blocked content is not very helpful most of the time.
Yep - imagine a world where most the top search results are subscription only. That's a terrible UX for a Google searcher to have to go past page 1 to actually have content. Google definitely doesn't want that, and its also anti-open web
if Google is going to accommodate pay-walled results like that, you can expect this indicator won't just show/hide 3 high quality pay-walled results, but also 30 shitty ones that will want to get in on the potential pay-wall business.
+1, this is already pretty bad for a lot of searches with so many sites erecting paywalls, pushing required logins, or anti-ad-block.
Not sure if they track this but whatever the Googlebot's view of the site's content, if it's constantly bouncing users back to the search page it should get hit with a hard penalty.
My thoughts exactly. If WSJ wants to increase monetization through paywalls, it has every right to do so, but should suffer in SERP accordingly as a paywalled-article is generally not what users are looking for.
WSJ also still serves articles to folks coming from Facebook. There's a Chrome extension to redirect all WSJ URLs through Facebook to bypass the paywall, works for now.
Clicking the bookmarklet when you're on a WSJ article will shunt the url through Facebook's redirect service, which will allow you to view the article.
Watching this very closely, I pay for a WSJ subscription because I think their content is better than most, and also because I get sent alot of links to their content.
Something about this later point feels like the argument people make about using Office because people still send them Excel and Word docs.
Similar to how software companies release free software to augment what makes them money, Bloomberg is able to spend a lot of money on producing content that is sponsored by their terminal subscriptions.
The WSJ might be in a unique situation where their primary audience will pay, often due to companies footing the bill for employee's, so perhaps they can be one of the few news producing companies that doesn't have to depend on Google for traffic in that their primary audience loads up their front page multiple times a day just to see what's there.
I wouldn't be surprised if they did a deal with Bloomberg to provide their content on terminals to further strengthen their ties to their core audience.
That's a really good point. Maybe the future of news survival is to pair it with a company that makes money, to support the journalism. Then again I feel like that's how we got CNN, MSNBC, FoxNews, etc. Maybe not a great idea. I really like Blomberg and to some extent WSJ. I hope they can maintain their integrity.
> Maybe the future of news survival is to pair it with a company that makes money, to support the journalism
To be honest, that's the past and present of most news as well.
Newspapers would have never been around based on advertising alone. The fee usually paid for printing and sending, and classifieds made up the bulk of the revenue. Now that they're decoupled and both news and classifieds are pretty much free, it's no wonder that news is struggling.
Hard news has (almost) never been a wildly profitable endeavor in and of itself.
The future of news survival is independent journalists being funded by people who care to have an unbiased investigative news media.
No idea how we get there, though. You basically need to persuade those who understand how critical journalism is to freedom to care to fund it, and to have a centralized platform to fund journalism that itself is not corruptible by monied interests trying to push propaganda.
Eh if we had benevolent entities funding journalism at a loss we'd all be far, far better for it.
We got Fox/CNN etc because journalism in service to advertisers = clickbait.
At this point, I think we can codify this: Journalism + Ad-Supported-Model = Clickbait.
Good journalism coming from firms which advertise is by accident. Like a broken clock being right twice a day.
I love subscription journalism because you're not playing the clickbait game.
There is no future for journalism in ad supported models. Plenty of future for infotainment, heck, infotainment masquerading as journalism might just take it all over, but journalism will certainly not be a part of that fold.
Organizations like WSJ should really try to market corporate level packages where specific IP ranges are cleared for free use. Keep it low enough that it does require more than a tier of approval and they could see some real money.
I am just surprised that sites don't do this already. Its par for the course with a lot of software, why not news or similar?
>The Journal decided to stop letting people read articles free from Google after discovering nearly 1 million people each month were abusing the three-article limit. They would copy and paste Journal headlines into Google and read the articles for free, then clear their cookies to reset the meter and read more, Watford said.
After the harder paywall, what's the best guess of the percentage of those google-copy-pasters will convert[1] to subscribers paying $278 or $296.94 or $308.91 per year? My guess is less than 1/10th of 1%. I assume the vast majority of the 1 million are casual readers who don't have $300 discretionary income to splurge on a subscription. If they can't read for free with a workaround, they'll do without it.
In related trivia, I just read that The Economist's strategy is to allow the google-copy-pasters.
I'm not judging either company as right. It's interesting they go about it differently.
I think that's the main thing that is different than my normal relationship with a newspaper. If I see an article that catches my fancy I buy that newspaper, if I don't I don't buy it (or the 'Sunday version'). While the universe would prefer I pay a subscription it's just not how I have ever interacted with a news publishing outfit.
I think part of my issue with the wsj complaint here is just trying to go back to the subscriber driven model but I have never worked with them that way. That's what makes that teaser rate even more frustrating- I just would want to pay and not try to make a financial decision or subscription. Can I just pay $5 for a week and not have a dog and pony show around it? I'm not so interested in a particular publication to want to make this kind of commitment.
The hardship for publishers won't end until two things happen, in my opinion: 1) they are willing to charge a reasonable price per article (no $300/year subscription) and 2) they are able to charge per article, in a cost-effective manner (i.e. "micropayments").
Why doesn't this exist yet? It seems so obvious, so many people get news from Facebook, twitter, Reddit, (and to a lesser extent HN) on an article by article basis. But all news sites delude themselves into thinking they can make these people loyal customers of their content, to the tune of paying $10-30 per month. I would happily pay $0.50 or something per article (maybe price adjusted by word count, within reason) to be able to get news from a wide variety of sources and engage with comments on the social/aggregator sites.
Is it just too big of a chicken and egg problem to coordinate a single broker for this across many sites? Unfortunately a broker is needed because I don't think there exists an external payment method with fees reasonable enough to make a lot of sub-1$ payments. That used to be touted as a use for Btc a few years ago but now btc fees are higher than credit card fees at something like $1-$3.
This is the tradeoff WSJ opted for, and it really doesn't make sense for it to work both ways. Web crawlers can't index content behind a paywall, end of story.
Where WSJ has a big advantage is that they already have brand name recognition. From the stats in the article it appears that this was a profitable choice for them, however, a lesser know content publisher seeking to monetize using paywalls is bound to have a much more significant challenge attracting users without decent search result rankings.
I reluctantly recognize that in order to have an unbiased press, we need news readers to pay the costs of operating a newspaper. I'm happy to pay for my news, because I want news to be truth, not propaganda.
Maybe the moral of the story is that Google News isn't the best source of news -- just the cheapest.
Though I do think it would make sense for historical articles (i.e. articles older than some arbitrary time period) should be removed from the paywall so as to be searchable and able to be referenced.
I have to disagree. People who are passionate about a subject will report on it whether they're paid to or not. I would argue that being motivated by profit only makes you more susceptible to bias.
> People who are passionate about a subject will report on it whether they're paid to or not
The Theranos story is an example of why we need institutions like the Wall Street Journal. You had a passionate reporter. But there was so much more needed to bring the story to light.
"When the Journal confronted [Theranos] with its findings based on the reporting thus far and sought its input, Theranos waged an aggressive campaign to discredit Mr. Carreyrou and his sources. Sunny Balwani, Theranos’s president, flew to the Phoenix area, where Theranos was offering its blood tests in Walgreens stores. There, some doctors who had talked with Carreyrou said Mr. Balwani pressured them to recant their statements. The famed litigator David Boies came twice to the Journal’s newsroom in midtown Manhattan to discuss the issues and tried to get the story killed.
The Journal’s top editors and lawyers, who had closely monitored Mr. Carreyrou’s reporting from an early stage, unflinchingly stood by him" [1].
Switching to a model where news is done philanthropically means turning off public oversight on anyone who can file a federal lawsuit.
I don't see why reporting the news is different from other jobs that people have a passion for and provide value to society. Should scientists not be paid because there are people who are curious about that natural world who would do it for free?
> Should scientists not be paid because there are people who are curious about that natural world who would do it for free?
I guess I'll come to the rescue of that strawman you're torching... Scientists are free to work for money or passion, as people are free to fund research or not. Whatever the balance of pay vs passion, science will continue to happen.
I might have been too hasty. The first couple times I read it, I thought your original comment was saying that paying for news wasn't worth it because journalists who are willing to work for free are less biased. Rereading it, it sounds like your point was more along the lines of paying for news in of itself doesn't lead to less biased coverage.
I'm not convinced it's possible for anyone to be free of bias and I don't claim that unpaid reporters are unbiased reporters. The onus will always be on the reader to draw their own conclusions and verify the integrity of the news they consume.
All right, why is someone who cares passionately about something less susceptible to bias than someone who's just reporting for a living? I would be inclined to believe the opposite.
There is a difference between news and opinion. Something like Stratechery is pure opinion; but it's opinion that I value because I feel it provides a lot of good insights.
It's fine to crowdsource opinion. It's not ok to crowdsource news -- trust is a fundamental part of the equation.
People who are passionate about a subject are usually passionate about a certain view on a subject. Do you really think political activists make good reporters?
I totally agree. People are more likely to pay to read the news they are biased to. So all forms of media have a high incentive to bias their content towards their consumer's point of view.
Given the human bias for sources that confirm rather than challenge preconceived notions, I'd imagine that market segment is quite small. Even among people that say they want such a service!
And on top of that, there's a difference between news and data. "xx people were run over by a truck in an attack group Y has claimed responsibility for" is a (poor) headline, not an article. Once you expand out beyond a reading of simple facts, bias creeps in everywhere from choice of sources to choice of words to quantity of coverage.
Worse, simple facts are rarely interesting or actionable. People want analysis.
>I reluctantly recognize that in order to have an unbiased press, we need news readers to pay the costs of operating a newspaper.
I find that having people pay for operations does not inherently mean that it becomes unbiased - and from American and British media (read: English media that I can read and judge) I do not find this to be the case.
I agree, and earlier this year I decided to subscribe to a newspaper for the first time in many, many years. I'm worried that subscribers can't stop the propaganda though.
There are so many 'news' sources nowadays, with broad national and international reach, that individuals can't possibly support all of the 'good' ones. At best they can subscribe to one or two. So all of the good sources have to compete for limited subscribers, while the bad ones are happy to give their propaganda away for free. That lets the propaganda sources flourish and multiply, until there are far more of them. They're also better at playing the attention-grabbing game, because they have none of the self-imposed limitations that the good news sources have.
The problem with a subscription model, at least the straightforward one that newspapers use today, is that it encourages political echo chambers. If someone mainly reads, say, the New York Times (which I do), and tends to agree with Times editorials, they'd probably benefit from also reading the WSJ on occasion (both the news and the opinion section), if only to understand the "other side"'s point of view. But people already have a strong cognitive bias towards consuming media they agree with, rather than media that challenges their beliefs and makes them uncomfortable. So it'd be hard enough to attract such a person to the WSJ if it were free. Ask them to fork over hundreds of dollars, and only those with the greatest desire for neutrality (or just a ton of money to spare) will accept.
In my ideal world, I'd be able to pay for one subscription that gives me access to all the professional news outlets, and they'd split the fee based on how much time I spent reading articles on each site. (Time, not clicks, to avoid creating an incentive for clickbait headlines.) Or maybe some sort of micropayments system that's a similar idea but for the whole Web. But I don't see either happening anytime soon...
You have an interesting point of view on political echo chambers. Today's rhetoric is that the NYT and WSJ are both "MSM" and therefore on the same side with the same point of view. You'd have to watch (not read) Fox, Russia Today, One America News, Breitbart, etc to get the other other side.
What's really disheartening is that this isn't just a problem with the American people; our president also falls into the trap of only consuming media he agrees with, and regurgitating it without challenge, consideration, or counterpoint to offset the source's bias. There's no easy solution for this; even if Trump leaves early his supporters and the system that got him elected will still be there to put someone like him (biased and manipulable) back into power.
If you change the way that you consume news, this is less of a problem.
Stop visiting the aggregators like Google News. Stop believing anything you read on Facebook that is not from a publication that you consider reputable.
There will always be people who are easily influenced because they refuse to take the time to understand the provenance of the information they consume. This has always been true. But if you truly consider yourself "an independent thinker" (which is why so many Americans claim to be "independents") you have to consider the trustworthiness of the outlets feeding you information. You can't outsource that decision making to anyone else like Google or Facebook.
Being paid doesn't make journalists free of incentives: it just changes where those incentives lie. The New York Times, for example, is busy blowing up a hundred years of reputation to try to pretend the Republican and Democratic parties are equally deserving of criticism because its readers want to feel intellectually superior to "partisans".
Paid news will deliver what its users want, and it is dangerous to pretend that that is "objective" or "true" just because that's what the PR says.
While the conservative media pundits like to rail against the NYT, it's actually a very reputable news source that fact-checks articles and issues retractions when it is wrong. Sure it's a bit left-leaning in the topics they report on, but they're usually not wrong.
Agreed. I tend to think of The New York Times and the Wall Street Journal as being mirror images of each other: they're both reputable and based in NYC [0], but the NYT has a liberal bent and the WSJ has a conservative one.
[0]: Being based in the largest city of the east coast gives them a different set of biases, which aren't entirely political.
The Journal’s ad revenue wasn’t affected by its recent drop in Google traffic because social media visits grew 34 percent in that time, keeping overall web traffic flat, Watford said.
Their numbers agree with your hypothesis: people simply switched from using the (now closed) Google "payhole" to the social media "payhole."
Many people used Google specifically to get around the newspaper's paywall. I imagine fewer people will use social media to do that. As a result, the fraction of content requests that are honest referrals would be expected to be higher.
You can easily use Twitter search logged out or without an actual Twitter account: https://twitter.com/search-home. Not hard to see the same google copy and pasters will just move to Twitter search.
I guess it worked ok for them, given that they are still doing it. I can never open a Forbes link. Not in a 'disable ad blocker pop-up' sort of way, but it just redirects to the home page. Sometimes it would work after a couple of tries. The upshot is that I never open Forbes links any more.
No, but I use the anti-adblock killer. Is this better ? While it's a good exercise academically, in the long run, I just don't visit sites that block ad-blockers.
The userscript that accompanies it catches a bunch of things that AAK misses. Ctrl-Fing through the script, I see there's a section specifically for "forbes.com".
That's a great question: They're still a company that I read articles from every now and then, so I personally assume they're doing well enough to still exist. How did it work out for them? Not trying to be a snarky bastard here, I genuinely want to know how it worked out for them, because I don't know this story and am too lazy to do research on it right now, and given the tone of your comment, I'd imagine it'll be easier for you to provide me some starting material to give me a head start.
I'm not who you replied to, but I was genuinely curious too. This is the best evidence I can find [0][1]... It looks like it probably did negatively impact Forbes (and other websites that have blocked adblockers).
But if they're sticking with that behavior, maybe it's not so bad? I also saw that their physical magazine readership is up (while other competing magazines are down or flat).
That doesn't seem like as big a deal. Ad blocker users are not uncommon, but not the majority either. "Google users" is quite possibly the broadest demographic you could possibly lose.
I wouldn't be so sure. They probably rely significantly on enterprise/university/library subscribers. Journals haven't gone anywhere despite being paid forever. This will also switch to the journal bait and switch model. Show the abstract in google results, but require money to read the full article.
This is free market capitalism that Rupert Murdoch's company Fox News promotes.
Google News fundamentally improved the online content publication system by making it cheaper to its consumers.
Earlier, there were too many news companies whose content didn't necessarily have an edge over some other news agency's and they charged money (which worked in an age where no internet existed), and google has fundamentally squeezed some of them including WSJ.
If the differentiating quality was present, more people might have been willing to pay for WSJ.
When Harrison Ford made cars, lots of businesses who built business around the inefficient means of transport (i.e the horse) like providing fodder for horses, and blacksmiths who were building horse shoes or the ones building horse carts got squeezed.
Same here. Internet and google has removed an inefficiency in the news system. They need to adapt and provide a differentiating quality, else they will perish - The principle of free market capitalism.
If WSJ needs google to behave a certain way so that it can make more money, that is not free market capitalism.
This is great, for a while it seemed 80% of the top 5's were WSJ, I got really tired of clicking to read a headline and getting the pop up saying something like "Wow, you clicks us a lot, you must like us, how about you pay?", Well no, actually you are just over ranked on so many topics...
Yeah I got annoyed for the same reason and used the Personal Blocklist Chrome extension to block WSJ from showing up in my search results at all. And recently I had to add Forbes to the same list.
Not all sources are equal in terms of quality and information provided in the linked to article.
IMO, in certain areas (financial industry, investigative journalism, international business issues, etc.) WSJ consistently has better coverage than most other sources - so I'm willing to pay for it now that they've made it much harder for me to get to it.
I regularly read other sources, and if at some point I feel I'm not getting enough value from WSJ, I'll cancel.
This submission's headline left out the other part of the opening sentence
After blocking Google users from reading free articles in February, the Wall Street Journal’s subscription business soared
So their views of free folks fell while paid subscriptions rose. I would think those subscriptions would be worth it, but as the article mentions "...argue that Google’s policy is unfairly punishing them"
Personally, I would think that the subscribers would make it more worth it. Afterall, I ended up adding components on to my NYTimes subscription and often forget that I pay monthly for it because it isn't that expensive compared to, say, home Internet and such.
Online Adverts are a form of micropayment. It's currently the only way to charge sub-cent amounts from users. It requires the third party of an advertiser, which is a very expensive middle man.
If we can figure out how to do micropayments without users needing an account somewhere (like PayPal), we can get rid of ads. That is the ultimate solution to this long-winding conflict between users, advertisers and content creators.
> how to do micropayments without users needing an account somewhere
That doesn't seem possible. What does even a hand-wavy sketch of this look like to you? I'd guess it's possible without, e.g. a unique row in a single (tho perhaps distributed) database table for each user, but I can't imagine it really being different than an 'account' either, e.g. a Bitcoin 'wallet'.
This is going to sound cold, but we need to have an honest chat here.
I would like a way never to see another WSJ or Forbes article on the internet.
I understand their need for profit, and I applaud them trying various things. They might be the best thing since sliced bread. I applaud a free press. They have a solid reputation, and I wish them the best.
However, I'm never going to pay. Ever. So I don't need to see their ads, I don't need to follow-through on Google clicks. I just need them to disappear from my internet experience completely. If you're going to run a paywall, I would like to never be exposed to your brand name or the fact you have content online. There are simply too many of you and too little minutes of concentration that I have to offer.
If one day I see a physical copy of one of these publications and decide to subscribe? That might be a lot of fun. But for now, don't waste any of my brain cells showing me stuff I'm never going to consume. Worse yet, leading me down some garden path only to be ambushed at the end with a paywall. Life's short. Your publications are not an important part of mine. I think part of the problem here was conflating the fact that X number of people clicked through with the assumption that somehow these were paying customers just freeloading (see the title of the article).
Good. These articles should be discriminated against by Google and ranked on the second page at best. They are worthless. Without the subscription, the whole site is worthless. Google is just reflecting that. WSJ has no right to complain. If they don't like it, they should make their content not be worthless. Pretty damn simple solution but it seems they prefer to whine instead. If these idiots haven't yet understood the market for online content, it's pretty hopeless anyway. People don't want to pay money for content online, they will only pay with their privacy and personal data. There is enough content to fill up millions, maybe billions of lifetimes. What part of the supply / demand equation does the WSJ not get? Google is a search engine for finding information on the Internet, not their personal ad agency. When people search for news, they want to find news, not some paywall. How is it Google's duty to hurt their own business so they can show what are essentially foreign ads it makes no money on (nothing more, nothing less) to its users?
Yep, this is the same sort of bullshit that Sexpertsexchange (experts-exchange.com) used to to do. They performed "cloaking" that let the searching index the entire page of content and then when regular users visited the site, they threw up the paywall. WSJ (and of course expecting to continue to rank well behind a paywall) is really just more of the same thing.
I also thought this was pretty rich. First broadly attack Google's income, then complain about discrimination when Google ranks your paywalled content lower.
Though I trust Google to make a decision based on user satisfaction, not unrelated meta-politics like this: Any website with a paywall should (and probably is) painted with the same brush. WSJ would benefit from pushing the angle that "Google punished us for doing critical journalism".
I wish there was some kind of "account sharing" with the media companies. I have a Globe and Mail and a New York Times subscription because I want to support journalism.
But then there is The Guardian, The Toronto Star, WSJ, ...
I'd happily pay $200 for a mega-subscription that went to a bunch of them.
Maybe there's an opportunity for a coalition of sources (WSJ, NYT, Bloomberg, WP, et cetera), to get together with a SE like DuckDuckGo and allow deeper search integration in some type of paid membership plan.
WSJ is discriminating against Google's user-base by blocking content that Google presumed the user would have access to. Yet WSJ thinks they're the ones being discriminated against. As WSJ is finding, Google will not even bother attempting to index, sort and recommend content that is unavailable to 99.999% of its user base. It's nice that they're supposedly better off, or at least the same off, without high Google results. Though I doubt this will stop them from endlessly attempting to game the system.
So if I understood this correctly their paid subscriptions increased 30% and their overall online traffic hasn't really been affected because people can still get the content through social media. If my understanding is correct then it seems this was a smart business move by the WSJ who is able to stand on their own reputation instead of relying on people finding their content through a third party.
I read often WSJ articles for free on internet. This gives me a good opinion of their articles. When I travel in train or plane, I often buy a physical newspaper. I may favour WSJ because of this good opinion.
IMHO, blocking "free" readers will not increase paying customers on the long run and may arm also their physical newspaper.
Reputation is something that takes long to earn and that can be lost very quickly. RIP WSJ
This sounds a lot like the Net Neutrality argument: All content should be treated as equal, whether it's pay or not, whether it's powered or supplied by the search engine/ISP or not.
Is there harm in ranking good content appropriately high (if relevant to the search) if it's clear that there is a cost to access it? Could Google just sigil "paid content" with a $ or € or whatnot?
Maybe but that means the search engine had access to the article's contents "for free" and therefore has a copy of that content in its index. If I am not also able to access that content the same way the search engine does (aka ... for free as an anonymous or not-logged-in user), that makes it a lot less appealing.
As a user, if I search for something, I want to get the information I was searching for, not an offer to buy that information. A link to a page where I can buy your product is not a search result; it's an ad. And it's totally fair (and possible) for WSJ to be required to pay for their ads on Googles result page, just like everybody else is doing.
As a user, if I search for something, I want to be told where it can be found. How to retrieve it once I know its location is a separate and independent problem.
A search engine that excludes sources that will require pay to get the information is a much less useful search engine to me, because it has no idea if the information is important enough to me that I will pay to get it if no free source is available.
I wonder how careful they were to distinguish between someone who cleared their cookies, and two people behind a nat. It's possible that the number of people abusing their system was smaller than they think.
Why would I pay for their articles when they are no better than the free alternatives? This is the same as Quora vs StackOverflow. Not only the latter is vastly superior it is also free.
The Journal’s ad revenue wasn’t affected by its recent drop in Google traffic because social media visits grew 34 percent in that time, keeping overall web traffic flat, Watford said. The Journal lets readers get some articles for free via social media like Twitter and Facebook, which the paper views as a marketing tool.
Could you post the list (and if you're really feeling kind the plug in)?
I'd love to add this to my filtering layer. Between Quora and ExpertSexChange I really could do with some more blacklisting of sites that keep baiting me to click.
It seems like the source code link is broken. Mozilla used to host the code right on the add-ons site but I can't find the new link if there is one. I discovered recently it doesn't catch iframes, haven't had time to fix it.
Edit: I'm an ass, looks like the link to the source is only for me when logged in :/ It's GPL, they should make that public. If you want it, I'll find a place to post it. I hate having accounts but I'm sure I can find a place to post it without one, worst case, pastebin.
Not mine, that only seems to work on Google search results which is rarely the problem I have. Reddit, HN, etc. often link to sites I already know I don't want to go to, my plug-in removes them from there. I copy the list to my RSS client so it filters them out there too.
A time machine is the only solution. As a web developer for a family-owned newspaper who recently removed our failure of a metered-paywall, the only solution I've been able to come up with is a time machine that sends a killer robot back to 1995/1996 and remove the incompetents industry wide who decided to give away their expensive to produce content for free.
How about sites with paywalls signal this, including subscription price, in a HTTP header (or whatever) to crawler bots?
Then Google users would be able to filter by "free results only" or "paid results, less than $x/month". Seems like a relevant search criterion, especially for industry-specific articles (like finance).
I think WSJ is wrong in saying that they are being punished by Google for their paywalled articles appearing last in search results.
Google has algorithms that would need to actually crawl the entire content, to ensure that it will provide the most relevant result. If it is unable to crawl the entire content, then it becomes just as hard to rank the same in the result.
Changing the algorithms to rank simply on the title and the first few (free introductory) paragraphs of articles behind paid subscription can lead to more bogus results. Google knows that invalid search results in turn disengages their users from perusing their search engine.
My startup is about to launch a partnership with the Wall Street Journal that will give our users free, un-paywalled access to the WSJ site.
If you want to join our beta, which is ongoing, sign up at www.ReadAcrossTheAisle.com. The beta is free, and our app is free as well. Hopefully this is relevant/useful for folks in this thread.
Bloomberg News is subsidized by the terminal business. It's free and public so terminal users can send non-payers links. I guess if you want your news funded entirely by a small group of people in the financial sector, that's fine. But for those of us grateful for the work the Wall Street Journal did to uncover e.g. the Theranos scandal, paying for good journalism is the way to go.
This isn't an assessment of content quality, it's of a user's experience.
All Google needs to do is optimize for things like time on page, CTRs, etc to see users abandoning those with paywalls and not those without to decide one is a better experience overall than another.
Bloomberg.com is great at giving you rubbish quickly. The WSJ is better at giving you good information for a fee. Depends on what you want to optimize for, I suppose.
Are you really going to argue that Bloomberg, the one which is able to make Bloomberg Terminal[1] a good value proposition, is worse in terms of quality of information than the Washington Post?
> Are you really going to argue that Bloomberg, the one which is able to make Bloomberg Terminal[1] a good value proposition, is worse in terms of quality of information than the Washington Post?
No, because I don't read the Washington Post and I don't see why it's relevant.
But yes, I do claim that bloomberg.com is a poor source of news. Bloomberg.com is to the Bloomberg Terminal as the Amazon Fire Phone is to AWS. Sure, the same company makes both, but one is rubbish and the other is something professionals use all the time.
In my opinion, the best news sources I read regularly are ones I pay for directly, with content hidden behind a paywall. This kind of reporting and investigating costs money to produce and is worth paying money for (and not just advertising views).
This is a false dichotomy. Abandoning a website because of a paywall !== bad user experience. The UX can be fine, but the user may be poor. Many websites have restricted logins and perfectly acceptable UX.
Sounds like something that can be solved using special metadata for paywall sites, along with a process to approve those who claim to be paywall sites.
“You are definitely being discriminated against as a paid news site.”
Jesus fucking christ, what a sentence. 'Google-bot can't index our pay-walled articles... DISCRIMINATION!!!' fuck wsj, seriously
Google touts itself as a company that wants to catalog all the knowledge. How is it possible that WSJ content ranking falls after the introduction of the pay wall? Google has the ability to crawl behind the wall content, and yet it downgrades the WSJ content. In my opinion, this is an abuse of its search dominance position.
> Google touts itself as a company that wants to catalog all the knowledge. How is it possible that WSJ content ranking falls after the introduction of the pay wall?
Because ranking is about utility to search users, and content not accessible to general search users isn't useful to them. The knowledge (both the content itself and the meta-data regarding it's inaccessible status) is still cataloged, this is just making effective use of the meta-data for the purposes of search UX.
Google may know you are subscribed but it doesn't know what the page looks like to you. It only knows its own version of the page.
What Google sees is that many people go to WSJ because their search query matches the content of the googlebot version of the page, and then bounce back as they hit the paywall. Google may not even know about the paywall. It just sees that because many users bounce back, they must be unsatisfied, and derank accordingly.
Now, it still has a workaround. If you are logged in, it tracks your personal habits. And if it sees you are staying on WSJ, it may uprank it specifically for you. Again, it is just a matter of staying on the site or not, google doesn't really care about your subscription.
> Google has the ability to crawl behind the wall content
I thought they weren't. How does that work? Do they have a login for WSJ? That would benefit WSJ as an incumbent news provider at the expense of startups too new or small to get special treatment from Google.
You are no longer providing value to our users. You will be quickly replaced with something that provides more value to our users.