You can bypass the Quora login screen by appending *?share=1* to the URL. There'...

devrandomguy · on June 5, 2017

If you change your browser's UserAgent string to Googlebot, then your client will be treated as a first-class citizen, by many of these sites. Google always wins, so let's all be Google.

Xeoncross · on June 5, 2017

Works great until you stumble on one of the big sites who will auto-ban you for not having a valid Google IP address.

Spivak · on June 5, 2017

* Reverse DNS records. Webmasters shouldn't be verifying Google's bots by hard coding IP addresses.

https://support.google.com/webmasters/answer/80553?hl=en

imron · on June 5, 2017

Shouldn't != never happens

dredmorbius · on June 5, 2017

CIDR blocks and ASN advertisments are cheap.

Update those periodically (hours / days / weeks). The adverts don't change particularly quickly.

dmn001 · on June 5, 2017

It's extremely rare to be ip-blocked by any website just for using the Google's user agent from a non-specific range. IP's get re-used and you can switch to a new one easily, so it's really not common or good practice for this to happen.

kbenson · on June 6, 2017

> IP's get re-used and you can switch to a new one easily, so it's really not common or good practice for this to happen.

On the flip side, some people can't change their IP addresses easily, and getting IP banned (even if rare because of the reasons you stated) is actually a major hassle when it actually happens for those people. :/

devrandomguy · on June 5, 2017

Is that really a thing? That must be such a hazard for their developers. I usually have a test for sites that I work on, that scrapes a few URLs as Googlebot, to verify that they are getting an optimized view (no JS, structural-only css).

Normal_gaussian · on June 5, 2017

God. That's the reason so many sites look great in the results and are confusing interactive messes when I get into them.

devrandomguy · on June 5, 2017

If it's any consolation, the site is well tested in Noscript mode.

SomeCallMeTim · on June 5, 2017

You sir are a god among Web developers.

If only they all did this. So many sites I get to and they're a blank page or an absolute disaster....

ChuckMcM · on June 5, 2017

Yes. Googlebot only crawls from legit addresses (even when their developers are trying new things) so it's an easy scraper/scammer signal to key off of.

dmn001 · on June 5, 2017

No. Most websites don't do this.

ipsum2 · on June 5, 2017

I just tried this, doesn't work on WSJ. Tried the user agents listed here: https://support.google.com/webmasters/answer/1061943?hl=en

bduerst · on June 5, 2017

Isn't that what the entire article is about? That WSJ no longer gives access to Google users (and bots)?

mirimir · on June 5, 2017

No. It allows Google bots to see full articles, but shows only the first paragraph or so to non-subscribers. Even if they're coming from Google search results.

However, I don't see cache links on Google :(

Edit: Oops, I'm wrong. The article does say that the Google bot only sees the first paragraph or so.

Jimmie_Rustle · on June 6, 2017

No. You're wrong, as it states in the article:

"The reason: Google search results are based on an algorithm that scans the internet for free content. After the Journal’s free articles went behind a paywall, Google’s bot only saw the first few paragraphs and started ranking them lower, limiting the Journal’s viewership."

mirimir · on June 6, 2017

Yes, you're right. I got confused by all the discussion about Google checking for cloaking by comparing results using different user agents.

So maybe this is why there's no Google cache.

Also, if Google can only index the first few paragraphs, the results are much less comprehensive.

__jal · on June 5, 2017

Or just turn off Javascript. Makes lots of sites better.

cynwoody · on June 6, 2017

I call that enacting the nuclear option. It's almost guaranteed to win the war with ad-tech! It should be enacted for sites with run-away ad engines that spin up your CPU fans and make scrolling laggy.

Of course, the problem with nuclear is collateral damage. Drop the bomb and ads don't work, but neither does a lot of other stuff. E.g., the site shows a blank screen, images are invisible or blurry, drop-down menus don't drop. And, of course, the deal-breaker: videos don't play.

The remedy for killing JavaScript is more JavaScript (and CSS). But supplied inside a Chrome extension targeted at the offending site. An injected stylesheet makes `<body>` visible again, hides assorted useless junk, and styles injected UI elements. Your content scripts load the missing images, drop the menus down, and play the unplayable videos in button-activated pop-over windows displayed at superior resolution.

Of course, the problem is, there are a lot of sites out there, and they change unpredictably, requiring your extension library to change in response. That argues for crowd-sourcing the extension library, but the crowd needs to be proficient in HTML, JavaScript, and CSS and know the ins and outs of browser extensions and care and have time.

You can completely change how a site presents. E.g., change a slide-show in a static slide window that barely moves due to the background ad-tech load changes into a set of `divs` that roll upwards as your finger swipes.

It's a hobby at best. Disabling ad-tech components by origin is the practical option.

__jal · on June 6, 2017

Call me Dr. Strangelove, then. I usually browse with JS off, enabling it on occasion. And there are some whitelisted sites.

I used to play around with filtering sites to make them less antisocial, but find that slog less entertaining these days. So now when confronted with a site that's useless without JS, eh, there's almost always another site out there that doesn't mind the terms I demand for my attention.

kevmo · on June 5, 2017

Ah, I made a similar (but simpler) one years ago: https://github.com/kevmo/greasemonkey/blob/master/quora_upvo...