Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
On the Future of 12ft (12ft.io)
139 points by jackdaw12 on Feb 14, 2022 | hide | past | favorite | 103 comments


I used 12ft for some time and got tired of loading bloated news websites twice. Archive.today is a good alternative but means making copies of crap articles somewhere else, while I just want to read them once. Ultimately I recommend txtify.it, a service that uses Readability, as a more sensible solution.

Bookmarklet: javascript:q=location.hostname+location.pathname;location.href="https://txtify.it/"+q;


Why not use the reader viewer feature of Firefox and Chrome? With Firefox I even like the fact that you can use text2speech so very long stories can be put on the background.


I believe txtify.it bypasses paywalls in some cases. Also, you can share the generated links.


Recently reader view in FF has been only showing the blob of text that shows before the paywall.


Every time I tried to use this thing it said it was disabled for every site I tried.


For example, I was just looking at a NYT article and nothing happened: https://12ft.io/proxy?q=https://www.wsj.com/articles/why-fin...

Similarly Outline: https://www.wsj.com/articles/why-financing-the-multi-trillio...

But the current top comment's txtify did: https://txtify.it/https://www.wsj.com/articles/why-financing... (though this doesn't have pictures)

Honestly I find that the vast majority of these services don't work. And then when they do work they don't work for long and we're just in a cat and mouse game.



Yeah, "12ft has been disabled for this site"


Well this is ironic. Well intentioned developer creates site to bypass paywalls. Site is a popular success, and begins to cost too much money for the developer to maintain. Developer comes up with a solution which is to introduce an optional paywall.

I'm not just poking fun... I actually sympathize because I'm in a similar situation with one of my projects. I've had to reconsider being a 100% ideals-based person and actually set up methods for people to give me money.

I've always appreciated sites offering free content, but now I'm in the unique situation where I'm the one publishing content. It takes a lot of time and energy to put together and my other sources of income can only cover the bills for so long. The harsh reality is that free services require money to operate.


I made a website for piracy when I was a kid and did the exact same thing; was definitely ironic for people to pay for something to avoid paying.

But I think it always has been and will continue to be more of a UX problem than a "wanting stuff for completely free" problem.

Case in point, a couple hours ago I tried to stream the Super Bowl from the official NFL website, but failed multiple times to finish their checkout process (which was only $0.99, well below what I would've been willing to pay), so instead I... went elsewhere.


The problem I have with news sites is that in-general, news sites are extremely hard to cancel subscriptions to. Even piracy sites that offer "donation benefits" are usually non-recurring or easily-cancelled. Even though I can easily afford to pay for the New York Times or Wall Street Journal, in-practice, I refuse because I know that cancellation is going to be an extremely arduous activity that isn't going to be fun, if I don't like it. Heck, I've seen pirate or sketchy sites (e.g., private forums, book clubs, file hosts) in entirely different languages (e.g., Chinese) that are easier to cancel than old-world journalism.

I've tried to support "traditional journalism gone digital" too! I paid for a subscription to Ars Technica once, but the website kept logging me out, so I needed an ad-blocker anyways. They also said they would send me a free WIRED subscription, but they sent me Sports Illustrated for a year. I tried to subscribe to The Atlantic, and after I paid, I learned I somehow signed up for a digital version that still included ads, because ad-free was a subscription offer that was limited and had to be specifically clicked on (the general site subscription wasn't the same as the ad-free one).

I want to support journalism! I give out $100ish a-month in pixiv fanbox subscriptions and patreon subscriptions. But I'm not going to give any payment information to sketchy sites that just make it impossible to cancel and have terrible experiences. It's not worth paying for the hassle of that.


News publications also still think they're competing with each other. Like, it seems like each newspaper assumes you're going to subscribe to them and then go there every morning like it's the 1990's and get 100% of your news from them, and charge accordingly.

If I'm going to pay for news it's going to have to be an all-you-can-eat deal like the music streaming services. I'll pay if it means that like 70% of the news links I click from Google News, social media or reddit are covered.


Isn't Apple News doing something like that?

And Twitter also has Twitter Blue, where they've done a deal with a bunch of sites so that if you click a link while on Twitter (and are subbed to Twitter Blue) you will have an ad-free experience while browsing the external site.


Yeah Apple News is a good step forward, but it's region-locked and not available where I live. I didn't know what about Twitter Blue, but the fact that you have to go through Twitter seems rather limiting.

At least it seems like some people have identified the market, hopefully it develops from here.


Yeah otherwise you could have an ad blocker blocker unblocker running ads.


I respect far more a site meant to bypass paywalls for shitnews that I probably will find laughable, but want to read.


I have one almost legitimate use for 12ft.io - sharing links for a paid subscription news site where I actually pay the annual fee and the site has it’s own way of letting users share articles with friends and family. But the built in process is onerous, one has to sign up and they nag you to subscribe which is a long journey to read one article. So I hope 12ft guy can find a niche like that to sustain a legitimate business.


This person is asking for money in order to help provide a service… and that service is “stop other people from asking for money to provide a service.” Hard not to feel a little irony.

There are approximately three outcomes here.

1. Google shuts this guy down, some way or another 2. Paywalled sites sue this into oblivion

Or worst of all,

3. Google stops being able to usefully surface decent news content.

Paywalls are annoying, but this is not a long-term solution.


How is "Google isn't allowed to scrap site to power their site for free" the worst option. Google can write a few checks to access the data. They have enough money.


If publishers wanted Google to not index their data without paying for it, they'd already asked them not to index, robots.txt and what not? I find it more likely that Google would come up with some sort of API that publishers can register, so that Google can get in (and 12ft can not)?


A long term solution is micropayments.

I'm happy to pay a few cents to read an article. I'm absolutely NOT happy to be forced to sign up for 10 different subscriptions.


I kind of laughed at that, but it seems like asking for trouble. Bypassing paywalls on a slightly underground basis is something the publications might tolerate or live with like ad blockers, but doing it for profit is painting a target on your back.

Also I wonder what kind of hosting 12ft is using, where bandwith costs so much, unless they are using 100s of TB per month or more (maybe they are). These days there is tons of super low cost bandwidth if you bypass the big providers.


exactly even a 10gbps line would only be a fraction of what AWS would charge you at a smaller outfit.


The “Why?” section of the home page is rather disingenuous, since the Economist is not “SEO optimized garbage”, nor does it want you to “sign up for some newsletter”. It’s just an excellent newspaper that needs to pay the staff that writes all those articles. Just like 12ft.io needs to pay its hosting provider.


I'd be fine with that if they didn't let search engines crawl their content. If they want to be on the web, they have to play by the rules of the web. Instead what they want is to reap the benefits of the web while refusing to participate in the open web.


What the heck are the “rules of the web”? Only valueless information is allowed to be indexed?

Are you offended that when you search for a book on Amazon you have to pay to read the whole thing?


It's a bait and switch. They give the search engines full text to crawl, and then nothing when a user clicks that link from their search. They got the tangible monetary benefits of having a high Google search ranking, without giving access to the actual content, which is how the web is supposed to work.

It's not my fault that news websites have continually debased their users with horrible advertising and tracking, to the point where ad-supported news is pretty much no longer viable. There are plenty of examples of smaller content offerings like podcasts which do quite well being ad-supported, because they have the trust of their audience, and they only accept quality advertisers so they can charge a premium for their ad space.

Books are different, they are not the web. I find things like Google Books very helpful, even for things like when I'm reading a physical book and want to look up which page I read something, even though I can't read the full book through it.


> They got the tangible monetary benefits of having a high Google search ranking, without giving access to the actual content, which is how the web is supposed to work.

That’s an opinion. Another opinion is that the web is a hypertext system and that Ted Nelson, who defined that term, included a royalty mechanism (https://en.wikipedia.org/wiki/Project_Xanadu#Original_17_rul...)


> Are you offended that when you search for a book on Amazon you have to pay to read the whole thing?

You don’t. ;)

And, those are the rules of the web.


Don’t understand the downvotes. There are legal ways to read books (https://www.overdrive.com/apps/libby/) and the ability to straight up not read the books but read summaries instead. The internet has provided this free flow of ideas and repackaging of thoughts.

There are also ways to read books that circumvent copyright, but in the spirit of this discussion, that’s not necessary to dive into.


Chances are even if google crawled 12ft.io it wouldn't show in the results because it'd be duplicate content.


You seem to have the misinformed belief that Google and search engines generally somehow make up the fundamental fabric of the web? Google is just a company that has a pretty good catalogue of what they've been able to piece together of what's available on the open web. Crawlers are no more a fundamental part of "the open web" than anything else.


GP's complaint is that these publishers are serving different content to Google than they serve to people, and Google knows this but pretends that people will get the indexed page. This is the search engine's fault. If it knows I will get a different result because I neither have nor want an account, it should let me hide that result, so I won't waste my time.


Meh, to each their own. I’m inclined to agree with 12ft.io on that front


I'd be happy to pay The Economist but I won't buy an expensive subscription because I don't read enough of their articles to make it worthwhile. I'd like a browser based wallet which I could use to pay per article with one click.


I like the model that Yalls.org implements: https://yalls.org/articles/97d67df1-d721-417d-a6c0-11d793739...

"Continue Reading: $0.06 USD"


This would work with true micropayments, i.e., payout in mills, not cents. You wouldn't even notice it, but like streaming, magnified by 1000's, eventually it's real money for the content provider.


Then search engine crawlers should get paywalled too.

The motivation is correct. You run a search on google and get mostly paywalled content. I'm fine with news sites requiring subscriptions to view their articles but they shouldn't also get the benefit of being listed at the top of search results for key terms.

Alternatively, the search list should show if content is paywalled or give you search options to remove paywalled content.


I'm not following your logic here. The Economist wants to charge readers because they produce high quality content that's worth paying for (in their estimation). This seems entirely orthogonal to whether they blacklist a web crawler - a crawler they didn't even ask for, which would be all over their website whether they want it or not.

I think you're confused because the crawler and the browser both use the same channel and the same protocols to access the information (the website over HTTP). But that's just a detail. Google could send them a hand written form for them to fill out with details of each of their articles and some thumbnail images to be manually entered into a Google database for all we care.


> I think you're confused because the crawler and the browser both use the same channel and the same protocols to access the information (the website over HTTP). But that's just a detail. Google could send them a hand written form for them to fill out with details of each of their articles and some thumbnail images to be manually entered into a Google database for all we care.

I disagree; I care quite a bit about whether the Google results are about the actual page I'm going to see or about what the page author claimed the page would be about. (Indeed I'm old enough to remember that what originally set Google apart from competing search engines was that it would ignore the meta keyword tags that authors used to describe their pages, in favour of indexing the visible page content directly)


And surely we'd want high quality content returned from a search engine. If Google never returned results where the company wanted me to buy something it'd be pretty sparse


Seems like you should take that up with Google? Why should The Economist be obligated to serve no content to a crawler just because they want to charge readers a fair price for their content?


I think the person you're replying to is agreeing with you.


You're a couple lines of robots.txt from not having your site appear in search engines most people use. Meanwhile, putting something up on the open internet includes the risk that people and robots will see it.


IMO the issue is the paywall is essentially "cloaking" by google webmaster standards. different content is displayed to the crawler (actual text of the article which gets indexed) vs the user (a paywall).

the content provider might not ask for the crawler but they are certainly catering to it - and benefitting from it.


I’m not confused and I don’t really understand why you think I am.

I believe the content that is indexed is the content you can see. Sites used to be penalised, heavily, for returning different content to google. Hiding the paywall for google falls into that bucket.

At a minimum the search results should display if they’re paywalled and provide tools to exclude that content from results.


I can see Google offering “private indexing” to big publications for a fee in the future. Keeping 12ft ladder out of the loop effectively.


1. I pay for The Economist, I want to share articles with my family & friends without them having to jump through the login hoop, until quite recently there wasn't an intention method to do this[1]

2. My partner is an annalist, her work is often quoted and republished in part or in full behind paywalls, going through correct channels to read what they published can take days to weeks & they need to know Pronto!

Edit: Clearly because I'm paying for The Economist I believe they should be remunerated for their journalism, but a hard policy of "no access unless you've verify-ably paid" would be a worse status quo

[1] https://web.archive.org/web/*/https://myaccount.economist.co...


It's disingenuous to tease content that is behind a paywall.


Someone better tell every movie studio on Earth that's it's disingenuous to make movie trailers.


Frequently movie trailers are very disingenuous. They include scenes not in movies. They emphasize minor scenes cut to present a different view of the movie to different audiences, often to the point of completely misleading at least one population as to what the movie is about.


Surely you don’t need someone to point out to you that those aren’t the same things.


You're right. One of them is professionally produced content, advertised with a preview to whet your appetite, where you're expected to pay for the right to consume it fully. While the other is.... uhh.... the same thing.


When you click a click to a trailer, you know it's not the whole movie.

When you click a click in Google, you don't know if it's paywalled.

How can you not see the difference? It's basic.


Guess that’s the thing. When it shows up in your search result, we do expect it to be ready to read without paying for it. This is partly Google’s fault for not making it easier to exclude paywalled content. Would paywalled content-providers like Google implementing that feature? I doubt it. But also we can use something like 12ft.io.


> How does it work?

> The idea is pretty simple, news sites want Google to index their content so it shows up in search results. So they don't show a paywall to the Google crawler. We benefit from this because the Google crawler will cache a copy of the site every time it crawls it.

> All we do is show you that cached, unpaywalled version of the page.

This seems like something that could be accomplished entirely client-side, without incurring bandwidth costs to the server.

Or am I missing something about how this service works?


You're not missing anything, it can be done fully client-side.

https://gitlab.com/magnolia1234/bypass-paywalls-chrome-clean


I'm glad they put clean in the title, otherwise I wouldn't trust it. :-)


You can't do it client side. Well-implemented paywalls clip the content server-side. If you're Google, you get the full HTML, if you're not, you don't.


But I could change my clients User Agent to match the Google Crawler?


Nope. Because your IP address is not one of Google's. This trick worked years ago but not today.


use a GCP instance


I'm pretty sure that CGP instances don't allocate IP addresses from the same range as google crawlers.

See also: https://developers.google.com/search/docs/advanced/crawling/...


Sure, and then you've got a cat-and-mouse game with all the other ways that sites can fingerprint you.


This is false. I use this extension every day and it works great. There are many ways to bypass paywalls. For example, you can pretend to be Google.

https://gitlab.com/magnolia1234/bypass-paywalls-chrome-clean


Doesn't Google penalize sites that show something different to the crawler than they do to a regular user?


I know that this used to be common knowledge, but evidence points to "no". Or at least to "either not anymore or not strongly enough".

Clear examples of websites that block content from me: Pinterest, Facebook, LinkedIn, newspapers.


According to Google:

> Cloaking refers to the practice of presenting different content or URLs to human users and search engines. Cloaking is considered a violation of Google's Webmaster Guidelines because it provides our users with different results than they expected.

https://developers.google.com/search/docs/advanced/guideline...

It’s clear that Google aren’t enforcing this rule thoroughly though.

Edit: It’s possible that the allowed sites are using JSON-LD to avoid being classified as cloaking:

> This page describes how to use schema.org JSON-LD to indicate paywalled content on your site with CreativeWork properties. This structured data helps Google differentiate paywalled content from the practice of cloaking, which violates our guidelines.

https://developers.google.com/search/docs/advanced/structure...


> If you're Google, you get the full HTML, if you're not, you don't. reply

This company isn't Google either, though.

Or are you suggesting that they're using Google Compute Engine and therefore get the unpaywalled version?


That might work with crappy implementations (eg. they check ASN = Google), but google publishes a whitelist of their crawler bot IP ranges, so spoofing that is all but impossible.


How is 12ft Ladder any different than this Github repo that does the same thing? This is one of the more popular repos as well at 22k stars.

Genuinely curious, as I've just been using [1] and have had no issues bypassing paywalls. If 12ft has access to more sites, or something, that would be a better use case.

The owner of the repo does not even want donations, just a star, otherwise, I'd contribute!

[1] https://github.com/iamadamdev/bypass-paywalls-chrome


12ft (and outline) request as a new user with a presumably fresh-ish IP address, which is sometimes required when the publication only sends the full article for completely new visitors (ie it can't be removed by clearing cookies and can't be removed by removing some paywall html/js).


It's not any different. It actually can access fewer sites, since some sites have disabled it, presumably by paying the owner? The only advantage of 12ft.io is that you have a shareable link for friends, and that it can be used on mobile.

Btw, you should switch the better maintained version of the Bypass Paywalls extension: https://gitlab.com/magnolia1234/bypass-paywalls-chrome-clean


Cease and desist most likely. NYT is arguably not publicly readable content, so the LinkedIn precedent doesn’t apply.


12ft.io does a great job of bypassing paywalls, which is needed because people want to avoid paying for the content they want to see. What's more, 12ft is basically an open CGI proxy server. The chances that it will be used to death without any efficient way to monetize it seems high. FWIW I do think it's a great service and is very helpful when you only need to access one article from a site you'll never need/want a subscription for. But I can definitely see there being people taking advantage of it when they want to avoid subscribing to whatever service.


If it’s successful won’t publishers fight back like they did with outline.com?


You'd think they'd have to take it up with Google.


So basically he makes it so people can steal articles... and then he's realizing that yeah, running a website is expensive and that's why companies have paywalls in the first place.


Check out https://webreader.app , it works with way more sites and is privacy preserving.


This works pretty well, thank you.


I honestly won't mind paying publishers to read the articles I want to read. However, I'm not interested in reading 90% of the articles from a publication/website. I hope, there will be an easy and seamless way to pay-as-you-read per article or a day/week/month pass.

It is not that I don't know how to "hack" and find ways to read paywalled articles -- almost every paywalled website is un-paywallable. It is that it is very irritating, and taxes the brain.

I pay for quite a few of them, hoping to support the writers, and do not worry about finding ways to find ways around. Eventually, I do NOT read more than 2-3 articles a week from them. Most of the time, months goes by without even stumbling on a single article from such publications.


I built something like this with 500k urls in the db, then sqlite shit the bed and now I can't use it.


Generally, for recreational web use, I never use Javascript, or even CSS. I prefer to use a text-only browser.

There are websites that apparently have "paywalls", but without Javascript, I do not even know these "paywalls" exist. I read every article on a website without any indication of any limitations.

The NY Times and The Economist are examples. What people refer to as "paywalls" are simply Javascript annoyances. One has to run the Javascript to experience the annoyance.

Perhaps the "privilege" of running the website's Javascript is a "benefit" of subscription. However I would not run the Javascript regardless of whether I was subscribed or not. That choice has nothing to do with the idea of "paywall". I choose not to run Javascript on any website. This improves the web experience for me in too many ways to count.

News publications could approach susbscription as (a) access versus (b) no access, e.g., password-protection. Yet some publications approach subscription instead as (a) access without Javascript annoyances versus (b) access with Javascript annoyances. Of course (b) only applies if one chooses to run the Javascript.

Every web user has the option not to run other people's code, i.e., Javascript.

The access model that scientific journals use seems to work well enough. Access is granted to an IP address. If the subscriber is on a different IP address, she can get access through a password-protected proxy.


>Bandwidth costs are getting high.

Have you considered using hosts that come with more free / unlimited bandwidth?


It doesn't work with known websites . I guess they are paid off by them ?


I find the site useful and want to pay, but don’t want a browser extension.


Why would you be required to download the extension?


From the home page explanation....

> How does it work?

> The idea is pretty simple, news sites want Google to index their content so it shows up in search results. So they don't show a paywall to the Google crawler. We benefit from this because the Google crawler will cache a copy of the site every time it crawls it.

> All we do is show you that cached, unpaywalled version of the page.

https://12ft.io/

They're just showing you Google cache? Like.. what you can get by putting `cache:` in front of a URL in chrome? (Or using an extension in Firefox?).

So, this site gives you access to the same thing by putting `12ft.io/ ` in front of the URL instead of, say, `cache:`? Is there something more to it? That... seems like an interesting thing to ask people to pay for.


I'm also confused. Their description seems incomplete, perhaps intentionally so?

I wonder if they're accessing pages from Google Compute Engine in an attempt to appear as a legitimate Google crawler?

If they're just loading the cached Google pages like anyone else, I don't understand why it's hitting their servers at all.


I doubt that’s how it really works.


I suspect they cache the cache? And resolve any css issues?


I’ve always found paying for pirated content (dvds, or streaming services) was gross.


Is there any way to filter out the paywalled links on HN? I'd rather just browse free web content than fight with paywalls and bypassers.


Maybe pay the original paywall?


I don't understand. Isn't this stealing?


No, because when you read an article a copy is made and sent to you. You are not depriving anybody from anything.


It is theft under federal law, punishable by up to 5 years in prison.

https://en.wikipedia.org/wiki/No_Electronic_Theft_Act


I read the text of the act [1] and the related sections in the US code [2]. I would argue that, assuming I read Title 18 correctly, this act doesn't apply to 12ft: the distribution is neither for "private financial gain" (Section 506.a) nor the distribution of "a work being prepared for commercial distribution" (because it's already released, Section 506.c). That leaves 506.b, but for that you need to show that the article has "a total retail value of more than $1,000" and is less than 180 days old.

[1] https://www.govinfo.gov/app/details/PLAW-105publ147

[2] https://www.law.cornell.edu/uscode/text


12ft.io is most certainly for financial gain. This thread is about them charging.

Also the act defines financial gain not about money, but about things 12ft.io engages in :"‘The term ‘financial gain’ includes receipt, or expectation of receipt, of anything of value, including the receipt of other copyrighted works."

Goodwill is considered value, even is included in company valuations. If 12ft.io expects or receives goodwill (which it does), it has financial gain.

And there's still simple good old infringement, at $150k a pop, which would likely put 12ft.io on the hook for millions to billions of dollars. Each unique article served carries a $150k bill.

There's also likely many other statues, such as accessing a computer for committing a crime, which is yet more federal prison time.

Whomever runs 12ft.io had better hope they can hide before someone locates and charges them.


Is there a reason this isn’t built into Brave? Lawsuits?


I don’t get the architecture behind the site. Why do the request need to come from this service? How are the paywalls actually being bypassed?

Edit: derp. Just read the now section. So if a copy of the cache is just being downloaded why couldn’t this type of thing all be done locally?


I replied this in another comment, but here is a look into an opensource version that you can compile. It will give you a good idea of how it works behind the scenes [1].

[1] https://github.com/iamadamdev/bypass-paywalls-chrome





Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: