Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Then search engine crawlers should get paywalled too.

The motivation is correct. You run a search on google and get mostly paywalled content. I'm fine with news sites requiring subscriptions to view their articles but they shouldn't also get the benefit of being listed at the top of search results for key terms.

Alternatively, the search list should show if content is paywalled or give you search options to remove paywalled content.



I'm not following your logic here. The Economist wants to charge readers because they produce high quality content that's worth paying for (in their estimation). This seems entirely orthogonal to whether they blacklist a web crawler - a crawler they didn't even ask for, which would be all over their website whether they want it or not.

I think you're confused because the crawler and the browser both use the same channel and the same protocols to access the information (the website over HTTP). But that's just a detail. Google could send them a hand written form for them to fill out with details of each of their articles and some thumbnail images to be manually entered into a Google database for all we care.


> I think you're confused because the crawler and the browser both use the same channel and the same protocols to access the information (the website over HTTP). But that's just a detail. Google could send them a hand written form for them to fill out with details of each of their articles and some thumbnail images to be manually entered into a Google database for all we care.

I disagree; I care quite a bit about whether the Google results are about the actual page I'm going to see or about what the page author claimed the page would be about. (Indeed I'm old enough to remember that what originally set Google apart from competing search engines was that it would ignore the meta keyword tags that authors used to describe their pages, in favour of indexing the visible page content directly)


And surely we'd want high quality content returned from a search engine. If Google never returned results where the company wanted me to buy something it'd be pretty sparse


Seems like you should take that up with Google? Why should The Economist be obligated to serve no content to a crawler just because they want to charge readers a fair price for their content?


I think the person you're replying to is agreeing with you.


You're a couple lines of robots.txt from not having your site appear in search engines most people use. Meanwhile, putting something up on the open internet includes the risk that people and robots will see it.


IMO the issue is the paywall is essentially "cloaking" by google webmaster standards. different content is displayed to the crawler (actual text of the article which gets indexed) vs the user (a paywall).

the content provider might not ask for the crawler but they are certainly catering to it - and benefitting from it.


I’m not confused and I don’t really understand why you think I am.

I believe the content that is indexed is the content you can see. Sites used to be penalised, heavily, for returning different content to google. Hiding the paywall for google falls into that bucket.

At a minimum the search results should display if they’re paywalled and provide tools to exclude that content from results.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: