I used 12ft for some time and got tired of loading bloated news websites twice. Archive.today is a good alternative but means making copies of crap articles somewhere else, while I just want to read them once. Ultimately I recommend txtify.it, a service that uses Readability, as a more sensible solution.
Why not use the reader viewer feature of Firefox and Chrome? With Firefox I even like the fact that you can use text2speech so very long stories can be put on the background.
Honestly I find that the vast majority of these services don't work. And then when they do work they don't work for long and we're just in a cat and mouse game.
Well this is ironic. Well intentioned developer creates site to bypass paywalls. Site is a popular success, and begins to cost too much money for the developer to maintain. Developer comes up with a solution which is to introduce an optional paywall.
I'm not just poking fun... I actually sympathize because I'm in a similar situation with one of my projects. I've had to reconsider being a 100% ideals-based person and actually set up methods for people to give me money.
I've always appreciated sites offering free content, but now I'm in the unique situation where I'm the one publishing content. It takes a lot of time and energy to put together and my other sources of income can only cover the bills for so long. The harsh reality is that free services require money to operate.
I made a website for piracy when I was a kid and did the exact same thing; was definitely ironic for people to pay for something to avoid paying.
But I think it always has been and will continue to be more of a UX problem than a "wanting stuff for completely free" problem.
Case in point, a couple hours ago I tried to stream the Super Bowl from the official NFL website, but failed multiple times to finish their checkout process (which was only $0.99, well below what I would've been willing to pay), so instead I... went elsewhere.
The problem I have with news sites is that in-general, news sites are extremely hard to cancel subscriptions to. Even piracy sites that offer "donation benefits" are usually non-recurring or easily-cancelled. Even though I can easily afford to pay for the New York Times or Wall Street Journal, in-practice, I refuse because I know that cancellation is going to be an extremely arduous activity that isn't going to be fun, if I don't like it. Heck, I've seen pirate or sketchy sites (e.g., private forums, book clubs, file hosts) in entirely different languages (e.g., Chinese) that are easier to cancel than old-world journalism.
I've tried to support "traditional journalism gone digital" too! I paid for a subscription to Ars Technica once, but the website kept logging me out, so I needed an ad-blocker anyways. They also said they would send me a free WIRED subscription, but they sent me Sports Illustrated for a year. I tried to subscribe to The Atlantic, and after I paid, I learned I somehow signed up for a digital version that still included ads, because ad-free was a subscription offer that was limited and had to be specifically clicked on (the general site subscription wasn't the same as the ad-free one).
I want to support journalism! I give out $100ish a-month in pixiv fanbox subscriptions and patreon subscriptions. But I'm not going to give any payment information to sketchy sites that just make it impossible to cancel and have terrible experiences. It's not worth paying for the hassle of that.
News publications also still think they're competing with each other. Like, it seems like each newspaper assumes you're going to subscribe to them and then go there every morning like it's the 1990's and get 100% of your news from them, and charge accordingly.
If I'm going to pay for news it's going to have to be an all-you-can-eat deal like the music streaming services. I'll pay if it means that like 70% of the news links I click from Google News, social media or reddit are covered.
And Twitter also has Twitter Blue, where they've done a deal with a bunch of sites so that if you click a link while on Twitter (and are subbed to Twitter Blue) you will have an ad-free experience while browsing the external site.
Yeah Apple News is a good step forward, but it's region-locked and not available where I live. I didn't know what about Twitter Blue, but the fact that you have to go through Twitter seems rather limiting.
At least it seems like some people have identified the market, hopefully it develops from here.
I have one almost legitimate use for 12ft.io - sharing links for a paid subscription news site where I actually pay the annual fee and the site has it’s own way of letting users share articles with friends and family.
But the built in process is onerous, one has to sign up and they nag you to subscribe which is a long journey to read one article.
So I hope 12ft guy can find a niche like that to sustain a legitimate business.
This person is asking for money in order to help provide a service… and that service is “stop other people from asking for money to provide a service.” Hard not to feel a little irony.
There are approximately three outcomes here.
1. Google shuts this guy down, some way or another
2. Paywalled sites sue this into oblivion
Or worst of all,
3. Google stops being able to usefully surface decent news content.
Paywalls are annoying, but this is not a long-term solution.
How is "Google isn't allowed to scrap site to power their site for free" the worst option. Google can write a few checks to access the data. They have enough money.
If publishers wanted Google to not index their data without paying for it, they'd already asked them not to index, robots.txt and what not? I find it more likely that Google would come up with some sort of API that publishers can register, so that Google can get in (and 12ft can not)?
I kind of laughed at that, but it seems like asking for trouble. Bypassing paywalls on a slightly underground basis is something the publications might tolerate or live with like ad blockers, but doing it for profit is painting a target on your back.
Also I wonder what kind of hosting 12ft is using, where bandwith costs so much, unless they are using 100s of TB per month or more (maybe they are). These days there is tons of super low cost bandwidth if you bypass the big providers.
The “Why?” section of the home page is rather disingenuous, since the Economist is not “SEO optimized garbage”, nor does it want you to “sign up for some newsletter”. It’s just an excellent newspaper that needs to pay the staff that writes all those articles. Just like 12ft.io needs to pay its hosting provider.
I'd be fine with that if they didn't let search engines crawl their content. If they want to be on the web, they have to play by the rules of the web. Instead what they want is to reap the benefits of the web while refusing to participate in the open web.
It's a bait and switch. They give the search engines full text to crawl, and then nothing when a user clicks that link from their search. They got the tangible monetary benefits of having a high Google search ranking, without giving access to the actual content, which is how the web is supposed to work.
It's not my fault that news websites have continually debased their users with horrible advertising and tracking, to the point where ad-supported news is pretty much no longer viable. There are plenty of examples of smaller content offerings like podcasts which do quite well being ad-supported, because they have the trust of their audience, and they only accept quality advertisers so they can charge a premium for their ad space.
Books are different, they are not the web. I find things like Google Books very helpful, even for things like when I'm reading a physical book and want to look up which page I read something, even though I can't read the full book through it.
> They got the tangible monetary benefits of having a high Google search ranking, without giving access to the actual content, which is how the web is supposed to work.
Don’t understand the downvotes. There are legal ways to read books (https://www.overdrive.com/apps/libby/) and the ability to straight up not read the books but read summaries instead. The internet has provided this free flow of ideas and repackaging of thoughts.
There are also ways to read books that circumvent copyright, but in the spirit of this discussion, that’s not necessary to dive into.
You seem to have the misinformed belief that Google and search engines generally somehow make up the fundamental fabric of the web? Google is just a company that has a pretty good catalogue of what they've been able to piece together of what's available on the open web. Crawlers are no more a fundamental part of "the open web" than anything else.
GP's complaint is that these publishers are serving different content to Google than they serve to people, and Google knows this but pretends that people will get the indexed page. This is the search engine's fault.
If it knows I will get a different result because I neither have nor want an account, it should let me hide that result, so I won't waste my time.
I'd be happy to pay The Economist but I won't buy an expensive subscription because I don't read enough of their articles to make it worthwhile. I'd like a browser based wallet which I could use to pay per article with one click.
This would work with true micropayments, i.e., payout in mills, not cents. You wouldn't even notice it, but like streaming, magnified by 1000's, eventually it's real money for the content provider.
Then search engine crawlers should get paywalled too.
The motivation is correct. You run a search on google and get mostly paywalled content. I'm fine with news sites requiring subscriptions to view their articles but they shouldn't also get the benefit of being listed at the top of search results for key terms.
Alternatively, the search list should show if content is paywalled or give you search options to remove paywalled content.
I'm not following your logic here. The Economist wants to charge readers because they produce high quality content that's worth paying for (in their estimation). This seems entirely orthogonal to whether they blacklist a web crawler - a crawler they didn't even ask for, which would be all over their website whether they want it or not.
I think you're confused because the crawler and the browser both use the same channel and the same protocols to access the information (the website over HTTP). But that's just a detail. Google could send them a hand written form for them to fill out with details of each of their articles and some thumbnail images to be manually entered into a Google database for all we care.
> I think you're confused because the crawler and the browser both use the same channel and the same protocols to access the information (the website over HTTP). But that's just a detail. Google could send them a hand written form for them to fill out with details of each of their articles and some thumbnail images to be manually entered into a Google database for all we care.
I disagree; I care quite a bit about whether the Google results are about the actual page I'm going to see or about what the page author claimed the page would be about. (Indeed I'm old enough to remember that what originally set Google apart from competing search engines was that it would ignore the meta keyword tags that authors used to describe their pages, in favour of indexing the visible page content directly)
And surely we'd want high quality content returned from a search engine. If Google never returned results where the company wanted me to buy something it'd be pretty sparse
Seems like you should take that up with Google? Why should The Economist be obligated to serve no content to a crawler just because they want to charge readers a fair price for their content?
You're a couple lines of robots.txt from not having your site appear in search engines most people use. Meanwhile, putting something up on the open internet includes the risk that people and robots will see it.
IMO the issue is the paywall is essentially "cloaking" by google webmaster standards. different content is displayed to the crawler (actual text of the article which gets indexed) vs the user (a paywall).
the content provider might not ask for the crawler but they are certainly catering to it - and benefitting from it.
I’m not confused and I don’t really understand why you think I am.
I believe the content that is indexed is the content you can see. Sites used to be penalised, heavily, for returning different content to google. Hiding the paywall for google falls into that bucket.
At a minimum the search results should display if they’re paywalled and provide tools to exclude that content from results.
1. I pay for The Economist, I want to share articles with my family & friends without them having to jump through the login hoop, until quite recently there wasn't an intention method to do this[1]
2. My partner is an annalist, her work is often quoted and republished in part or in full behind paywalls, going through correct channels to read what they published can take days to weeks & they need to know Pronto!
Edit: Clearly because I'm paying for The Economist I believe they should be remunerated for their journalism, but a hard policy of "no access unless you've verify-ably paid" would be a worse status quo
Frequently movie trailers are very disingenuous. They include scenes not in movies. They emphasize minor scenes cut to present a different view of the movie to different audiences, often to the point of completely misleading at least one population as to what the movie is about.
You're right. One of them is professionally produced content, advertised with a preview to whet your appetite, where you're expected to pay for the right to consume it fully. While the other is.... uhh.... the same thing.
Guess that’s the thing. When it shows up in your search result, we do expect it to be ready to read without paying for it. This is partly Google’s fault for not making it easier to exclude paywalled content. Would paywalled content-providers like Google implementing that feature? I doubt it. But also we can use something like 12ft.io.
> The idea is pretty simple, news sites want Google to index their content so it shows up in search results. So they don't show a paywall to the Google crawler. We benefit from this because the Google crawler will cache a copy of the site every time it crawls it.
> All we do is show you that cached, unpaywalled version of the page.
This seems like something that could be accomplished entirely client-side, without incurring bandwidth costs to the server.
Or am I missing something about how this service works?
You can't do it client side. Well-implemented paywalls clip the content server-side. If you're Google, you get the full HTML, if you're not, you don't.
> Cloaking refers to the practice of presenting different content or URLs to human users and search engines. Cloaking is considered a violation of Google's Webmaster Guidelines because it provides our users with different results than they expected.
It’s clear that Google aren’t enforcing this rule thoroughly though.
Edit: It’s possible that the allowed sites are using JSON-LD to avoid being classified as cloaking:
> This page describes how to use schema.org JSON-LD to indicate paywalled content on your site with CreativeWork properties. This structured data helps Google differentiate paywalled content from the practice of cloaking, which violates our guidelines.
That might work with crappy implementations (eg. they check ASN = Google), but google publishes a whitelist of their crawler bot IP ranges, so spoofing that is all but impossible.
How is 12ft Ladder any different than this Github repo that does the same thing? This is one of the more popular repos as well at 22k stars.
Genuinely curious, as I've just been using [1] and have had no issues bypassing paywalls. If 12ft has access to more sites, or something, that would be a better use case.
The owner of the repo does not even want donations, just a star, otherwise, I'd contribute!
12ft (and outline) request as a new user with a presumably fresh-ish IP address, which is sometimes required when the publication only sends the full article for completely new visitors (ie it can't be removed by clearing cookies and can't be removed by removing some paywall html/js).
It's not any different. It actually can access fewer sites, since some sites have disabled it, presumably by paying the owner? The only advantage of 12ft.io is that you have a shareable link for friends, and that it can be used on mobile.
12ft.io does a great job of bypassing paywalls, which is needed because people want to avoid paying for the content they want to see. What's more, 12ft is basically an open CGI proxy server. The chances that it will be used to death without any efficient way to monetize it seems high.
FWIW I do think it's a great service and is very helpful when you only need to access one article from a site you'll never need/want a subscription for. But I can definitely see there being people taking advantage of it when they want to avoid subscribing to whatever service.
So basically he makes it so people can steal articles... and then he's realizing that yeah, running a website is expensive and that's why companies have paywalls in the first place.
I honestly won't mind paying publishers to read the articles I want to read. However, I'm not interested in reading 90% of the articles from a publication/website. I hope, there will be an easy and seamless way to pay-as-you-read per article or a day/week/month pass.
It is not that I don't know how to "hack" and find ways to read paywalled articles -- almost every paywalled website is un-paywallable. It is that it is very irritating, and taxes the brain.
I pay for quite a few of them, hoping to support the writers, and do not worry about finding ways to find ways around. Eventually, I do NOT read more than 2-3 articles a week from them. Most of the time, months goes by without even stumbling on a single article from such publications.
Generally, for recreational web use, I never use Javascript, or even CSS. I prefer to use a text-only browser.
There are websites that apparently have "paywalls", but without Javascript, I do not even know these "paywalls" exist. I read every article on a website without any indication of any limitations.
The NY Times and The Economist are examples. What people refer to as "paywalls" are simply Javascript annoyances. One has to run the Javascript to experience the annoyance.
Perhaps the "privilege" of running the website's Javascript is a "benefit" of subscription. However I would not run the Javascript regardless of whether I was subscribed or not. That choice has nothing to do with the idea of "paywall". I choose not to run Javascript on any website. This improves the web experience for me in too many ways to count.
News publications could approach susbscription as (a) access versus (b) no access, e.g., password-protection. Yet some publications approach subscription instead as (a) access without Javascript annoyances versus (b) access with Javascript annoyances. Of course (b) only applies if one chooses to run the Javascript.
Every web user has the option not to run other people's code, i.e., Javascript.
The access model that scientific journals use seems to work well enough. Access is granted to an IP address. If the subscriber is on a different IP address, she can get access through a password-protected proxy.
> The idea is pretty simple, news sites want Google to index their content so it shows up in search results. So they don't show a paywall to the Google crawler. We benefit from this because the Google crawler will cache a copy of the site every time it crawls it.
> All we do is show you that cached, unpaywalled version of the page.
They're just showing you Google cache? Like.. what you can get by putting `cache:` in front of a URL in chrome? (Or using an extension in Firefox?).
So, this site gives you access to the same thing by putting `12ft.io/ ` in front of the URL instead of, say, `cache:`? Is there something more to it? That... seems like an interesting thing to ask people to pay for.
I read the text of the act [1] and the related sections in the US code [2]. I would argue that, assuming I read Title 18 correctly, this act doesn't apply to 12ft: the distribution is neither for "private financial gain" (Section 506.a) nor the distribution of "a work being prepared for commercial distribution" (because it's already released, Section 506.c). That leaves 506.b, but for that you need to show that the article has "a total retail value of more than $1,000" and is less than 180 days old.
12ft.io is most certainly for financial gain. This thread is about them charging.
Also the act defines financial gain not about money, but about things 12ft.io engages in :"‘The term ‘financial gain’ includes receipt, or expectation
of receipt, of anything of value, including the receipt of other
copyrighted works."
Goodwill is considered value, even is included in company valuations. If 12ft.io expects or receives goodwill (which it does), it has financial gain.
And there's still simple good old infringement, at $150k a pop, which would likely put 12ft.io on the hook for millions to billions of dollars. Each unique article served carries a $150k bill.
There's also likely many other statues, such as accessing a computer for committing a crime, which is yet more federal prison time.
Whomever runs 12ft.io had better hope they can hide before someone locates and charges them.
I replied this in another comment, but here is a look into an opensource version that you can compile. It will give you a good idea of how it works behind the scenes [1].
Bookmarklet: javascript:q=location.hostname+location.pathname;location.href="https://txtify.it/"+q;