Cloudflare and RSS

deno · on Oct 5, 2016

Cloudflare’s misguided reliance on Javascript Paywalls[1] is fundamentally hostile to open web. It’s essentially a form of DRM.

And they don’t even bother to implement it properly—for example if your site tries to follow best practices and uses a separate domain for your static assets, you will just get errors on your static assets, resulting in a page with broken styling and no images. That despite pissing off your users by having them go through the Google hosted captcha (which also breaks all the time btw[2][3]).

One of the websites that was horribly broken by this was Stack Overflow. As anyone trying to stay safe on public WiFi by using VPN can attest.

Coincidentally, Cloudflare has lost Stack Overflow as their customer recently: https://meta.stackoverflow.com/questions/323537/cloudflare-i...

They’re now behind fastly.

_

[1] https://ipfs.pics/QmTZo6oPKHwUgWB7p7LfZwZsVQJV1n7k9qNQNZBCEu...

[2] https://ipfs.pics/QmeuJjgV621NV9aNKyNAUoEHdWZYtzCrwkLHoHneg3...

[3] https://ipfs.pics/QmRWcCkBdaG214GKttkGFcadncUJ6YvfMTSE8jiAxA...

bonus picture: https://ipfs.pics/QmPkncvs2R9EkhZQuzPzWYs4z7UUdKqQzg1k8mc7y5...

kyledrake · on Oct 5, 2016

And if you turn off Cloudflare's protection to fix this, somebody that wants to censor you and has $20 will use one of the hundreds of DDoS booters (most of them are behind Cloudflare) to nuke your site, unless you're Brian Krebs and qualify for Project Shield.

I'm very optimistic about the direction the internet is going in right now.

kefka · on Oct 5, 2016

And notice those weird hashed links? IPFS hashes.

As a quick primer, it's a hash that points towards a directory of content. Everything's deduplicated. Based on Torrents, GIT, and self-referential filesystems.

an IPFS hash is immutable. The hash points at the hash, no matter what. Indestructible. Publish stuff by

ipfs add -r folder

An IPNS hash points towards an IPFS hash. It's a pointer you publish every 12 hours. It IS mutable. Do this by

ipfs name publish <ipfs key of resource>

The browser plugins with IPFS running allow you to pull the DNS text record of the IPNS hash, and you never touch the website!

Example, PageNodes :

http://ipfs.io/ipns/QmVjH4F65fnqy1GkBBYiuAkdazKzYsw3LbMVANGF... POINTS AT ---> http://ipfs.io/ipfs/QmbLPfyehFnViKZpU237P6a6DpjCfWFSoDBMQFGU...

Tl;Dr. DDoS makes no sense regarding IPFS. Everybody makes the network faster.

delluminatus · on Oct 5, 2016

Sorry, but the OP wasn't talking about ipfs at all. He was talking about Cloudflare.

Having said that, thanks for the interesting digression. You've made me want to try out ipfs.

kefka · on Oct 5, 2016

Heh, in my mind it made perfect sense.

CloudFlare is a anti-DDoS and CDN network. IPFS is a CDN protocol that anyone can join or put files into. It doesn't quite hide the endpoints, but anyone can inject data.

It does what CloudFlare does, but better. And as more people/nodes get online, free and ubiquitous.

internals · on Oct 6, 2016

This was very informative, thank you!

t3ra · on Oct 5, 2016

(Sorry for the non-useful contribution but...) I LOVE THIS COMMENT!

tracker1 · on Oct 5, 2016

More likely the Python client is using the stock python user-agent, this should be customized per application. The reason for this, is most stock scrapers and malicious agents are using stock engines with defaults.

problems · on Oct 5, 2016

CloudFlare's CAPTCHA can be trivially disabled by the site owners if they so choose. It's a trade-off though - many leave it enabled to prevent a layer 7 attack... there's not many other options to do that and there's absolutely none available to anyone but the biggest sites.

TechRemarker · on Oct 5, 2016

Sounds like the website owner didn't setup up a page rule properly for it's RSS feeds in CloudFlare. If you flip a switch and don't customize it properly you will likely run into issues like this, but that's not Cloudfare, rather the webmaster for the site who you should be contacting.

smacktoward · on Oct 5, 2016

Why on earth shouldn't it do the right thing out of the box? Isn't "this stuff is hard, let us figure it out for you" CloudFlare's entire value proposition?

koolba · on Oct 5, 2016

> Why on earth shouldn't it do the right thing out of the box?

Which is what exactly? The default being to have a uniform service across all asset types seems sensible enough to me. It's either that or everything is off by default and you manually include paths/expressions that are included. The latter seems more of a hassle for most people.

> Isn't "this stuff is hard, let us figure it out for you" CloudFlare's entire value proposition?

I'm not sure what their angle is. As far as I can tell it's to re-centralize the internet so they can be the single tap point where SSL is added and removed[1]. Once they've got a sizeable chunk of the world funneling through them, they could make some decent coin giving direct access to the feds.

Until then, it's just free SSL and DDOS protection. Can't really complain against free either. Heck just don't use it.

[1]: http://www.newyorker.com/wp-content/uploads/2013/11/nsa-smil...

tracker1 · on Oct 5, 2016

They're a CDN... their purpose is DDoS mitigation, and improved delivery of cached resources... That means running analysis on traffic, and it sucks when you get caught in the net. That's how it works though.

Odds are the nntp client in question should customize the user-agent, and a request made to Cloudflare to improve the ranking for that agent... that's about the best that can be expected here.

daenney · on Oct 5, 2016

> Heck just don't use it.

That really is not a viable option when you want to access content which happens to (now) be behind Cloudflare.

koolba · on Oct 5, 2016

> That really is not a viable option when you want to access content which happens to (now) be behind Cloudflare.

Sure it does. Vote with your bytes and move along.

Navarr · on Oct 5, 2016

The real solution here is that the blog owner needs to turn off security for their RSS feed URLs.

Cloudflare could make this easier if it detected that they had an RSS feed and offered a suggestion, via email or the browser (or both) that was like "Create a security exception for your RSS feed"

LeifCarrotson · on Oct 5, 2016

No, the blog owner still needs protection on the RSS feeds.

If they publish something that offends someone with a botnet, and have an exception for the RSS feed, the offended someone will just attack the feed to take down the site.

The real solution is for the feed reader to contain better error handling.

afandian · on Oct 5, 2016

You can surely do DDOS protection on a particular URL without a bot detector.

LeifCarrotson · on Oct 6, 2016

Sure. For example, throw a 503 on requesting that URL, or present a captcha. But then feed readers that get caught in the DDOS protection, can't do the captcha, and don't handle errors will fail to get the feed.

cm3 · on Oct 5, 2016

For a year now Google's reCAPTCHA has been very hard to convince as a human that I selected the correct 3 (or 4) rivers, houses, street numbers, mountains. Most of the time it makes me take the test twice or or more, even though reCAPTCHA is wrong. I feel like, if we're used to help train their neural network, it should at least have the courtesy to not require so many I-don't-believe-you-please-retry iterations.

Given how many sites use Cloudflare, I'd welcome they switch to a more reliable captcha service until Google repairs their scripts.

Oh and other captcha services work in Tor Browser Bundle.

beardog · on Oct 6, 2016

reCAPTCHA intentionally occasionally calls correct answers wrong in order to make it harder for bots to learn.

zwily · on Oct 6, 2016

If getting the right answer isn't adequate, what's the point?

cm3 · on Oct 6, 2016

Bots can still learn with that reCAPTCHA behavior. We're basically providing free labor to Google and they're making is love Solvemedia and other captcha services for providing something that actually doesn't turn into a minute long puzzle exercise, when all I wanted to do was post a comment on a site.

sudhirj · on Oct 5, 2016

Might be a problem with Cloudflare marketing. I use it for my sites and well, but I don't think of it as a magic button / service that makes all my problems go away.

Cloudflare is a hosted reverse proxy service that handles DDOS protection. It enforcing rules on RSS pages is no different from me putting in a captcha extension on nginx or Apache and having it run on all pages by mistake.

And this isn't even a default configuration issue. This is simply a mis-configured service, no different from misconfigured haproxy / nginx / Apache / Cloudfront.

jgrahamc · on Oct 5, 2016

The author can email me jgc @ cloudflare and I'll be happy to help.

idlewords · on Oct 5, 2016

Damn it, the email tests for javascript!

medecau · on Oct 5, 2016

Internet scale right here.

jgrahamc · on Oct 5, 2016

I was suggesting that the person had a problem with our service and could try contacting us. The 'story' posted has little detail of any kind so it is hard to assist.

mey · on Oct 5, 2016

The story is about a general practice/design of CloudFlare, not a specific site. Fixing it for one site or one user ip address won't fix the fundamental design that CloudFlare expects a human and a web browser behind every request.

jgrahamc · on Oct 5, 2016

I am Cloudflare's CTO. I know how our systems operate. I was asking for this person to contact me so I can understand what is happening in this instance.

medecau · on Oct 5, 2016

RSS is an XML file that is accessed by HTTP clients that are not browsers and do not render Javascript.

http://cyber.harvard.edu/rss/rss.html

webscaleizfun · on Oct 5, 2016

The two issues the author brings up are broad issues I run into with Cloudflare protected sites all the time. The fundamental assumption that everything using the internet has a full JS engine and a human immediately ready to solve a reCAPTCHA is flawed, and thinking instances like this are a one-off is inherently wrong.

Torrent traffic, file transfers, VOIP and all the other non-HTTP type traffic that Cloudflare just breaks by default make up a good chunk of the traffic you see on the web. That Cloudflare pitches itself as a one and done solution with minimal configuration just nakes this worse, since website owners generally won't bother to set up custom rules for RSS feeds and the like. Additionally, if even 1% of the RSS feeds that were broken by cloudflare were emailed to you, your inbox would be flooded.

jgrahamc · on Oct 5, 2016

Torrent traffic, file transfers, VOIP and all the other non-HTTP type traffic that Cloudflare just breaks by default make up a good chunk of the traffic you see on the web.

Huh? We don't handle non-HTTP traffic. How can we break it?

dredmorbius · on Oct 6, 2016

wget or curl might be examples of breakage.

I'm known to use console/text clients (w3m, lynx, links, elinks[2]) from time to time. Cloudflare definitely interferes with these.

Not sure about the other examples given.

And, to hijack: I wanted to say thanks for the work on a Tor-friendly anonymised reputation system. I've commented on that in the past, and need to take a closer look / see others' thoughts, but definitely appreciate the effort.

narrowrail · on Oct 5, 2016

Isn't the web, by definition, HTTP(s) connections only (ports 80/443)?

_wmd · on Oct 5, 2016

Referring to an upset end user's complaint with drama quotes doesn't strike me as particularly good form

bonkabonka · on Oct 5, 2016

I switched to using https://github.com/Anorov/cloudflare-scrape in my RSS bot because several of the blogs I follow moved to CloudFlare. It wasn't precisely a drop-in replacement for Requests, but it wasn't too hard to wire up.

spikej · on Oct 5, 2016

That's neat! I've been using https://phantomjscloud.com/ to render and then use that

blfr · on Oct 5, 2016

Just reading the blog in my browser is now somewhat hampered because Cloudflare thinks I’m some sort of cyberterrorist and requires my browser to run a javascript anti-turing test.

Google does the same thing on Youtube when you're browsing from a "bad neighbourhood" (OVH). Incredibly annoying and, unlike Cloudflare, redirects you to youtube.com instead of the video you wanted to watch. The problem keeps reoccurring despite being logged into a Google account.

Ironically, youtube-dl from the same IP works just fine. So I don't what they're protecting. Are they trying to prevent automated comment-reading?

jakobegger · on Oct 5, 2016

Maybe they want to prevent services that increase play count?

JohnTHaller · on Oct 5, 2016

As OVH is known for hosting spammers (email, comments, like/upvote/view sales, etc) I can't say that I disagree with that behavior.

breakingcups · on Oct 5, 2016

Besides increasing view-count and comment-reading, comment posting (spam) might also be in their crosshairs.

problems · on Oct 5, 2016

CloudFlare has page rules for a reason. It's trivial to configure no anti-bot protection on your RSS feeds.

In fact, it's possible that the person he was hitting was in "I'm under attack" mode or similar, which would try to reduce bot hits to the web server by any means necessary to prevent a layer 7 attack from taking the site offline.

alpb · on Oct 5, 2016

But sounds like it should be the default for this feed/atom feed/xml Content-Types? Humans do not read RSS feeds.

jerf · on Oct 5, 2016

Since we're talking about CloudFlare at all, we are automatically in a security context.

In a security context, automatically poking a hole through for RSS is automatically giving attackers an easy-to-use door straight through to the underlying site to DDoS them.

You might want to say "Oh, well, then, let's just set some bandwidth rules", which will certainly work for specific sites, but it's going to be difficult for CloudFlare to correctly guess them generically. (Not necessarily impossible, but it is impossible if you measure it from the POV of them never being wrong. It would only be a heuristic guess.)

leesalminen · on Oct 5, 2016

Attackers do not read responses from HTTP requests either. I think it's fair that the default behavior is to protect all endpoints and a user can explicitly change the behavior.

problems · on Oct 5, 2016

They'd have to probe every endpoint to figure out the content-type of the return.

And if your web application allows queries that produce RSS feeds that could still result in a really bad L7 attack if you simply were to ignore all feeds. No caching + randomized queries on a small site would knock it offline in no time.

tracker1 · on Oct 5, 2016

I would think that a good caching solution on the server for RSS requests could perform decently even under DDoS scenarios... though, this is why people used other services in front of their RSS feeds, so that they were better cached. Most blogs aren't generation more than a couple new articles a day, so caching 15-120 minutes wouldn't be an issue for most use cases.

huphtur · on Oct 5, 2016

Could you elaborate how to set up such a page rule please?

problems · on Oct 5, 2016

This article is pretty good:

https://support.cloudflare.com/hc/en-us/articles/200168306-I...

It's pretty much dead simple though, basically you put in a path like "/feed.rss", you can also use wildcards like "/feeds/*" and then set up a configuration for it including parameters like cache time and security level. In the case of a feed, high caching and minimum security are probably reasonable for most sites. Free users are limited to only a few rules (5 I think), but paying even for the cheapest plan removes the limit.

parennoob · on Oct 5, 2016

I'm tired of seeing CAPTCHAs, redirects, those awful Google "I'm not a robot!" tests, and being forced to fork over my phone number every time I register an email account (it is no longer possible to get an email account at GMail or Yahoo without giving them a phone number). It seems like no one just wants to serve regular web pages fast any more.

In my opinion, anyone who puts this sort of stuff in front of their blog is overly paranoid (unless they are some sort of high-profile victim, like Brian Krebs). Just avoid reading their blog.

neoCrimeLabs · on Oct 5, 2016

Let me introduce you to the concept of defence in depth:

https://en.wikipedia.org/wiki/Defense_in_depth_(computing)

Really, the longer a website is online, the more it attracts bots. Spam bots, brute-force login attempting bots, information gathering bots, vulnerability testing bots, and many more. This isn't even counting targeted attacks.

I've seen many small websites recieve much more bot traffic than user traffic. Sometimes to a crippling level.

My point is, it's not paranoia if they really are after you - even if it's a mindless mass of bots.

Like all security though, sometimes you affect legitimate users. This is why it's great to have multiple ways they can provide feedback.

tracker1 · on Oct 5, 2016

At [a certain auto website I used to work at] the bot traffic was a significant amount of the requests... Because of the nature of referal traffic we were required to let the bot traffic stand...

This was mitigated by implementing a few layers of a decent caching strategy, as well as some db improvements, and moving search queries to a separate database server (mongodb, then elasticsearch) altogether.

In the end, there are lot of things you can do to help mitigate these things... It really just depends on what you are trying to accomplish with a given site.

ryanlol · on Oct 5, 2016

Funnily enough, Krebs doesn't put this sort of stuff in front of his blog. I've never seen a captcha from Google Shield or Akamai.

The only reason Cloudflare captchas exist is because of utterly incompetent engineers, nothing else. You can stop attacks without breaking the internet for people without US residential IPs.

tracker1 · on Oct 5, 2016

Somewhat agreed... While there are other methods, and probably should be for GET with non-html/image/script responses. Or for that matter, common authentication cookies.

I would much rather than Cloudflare added a header when in "under attack" mode, and delivered cached responses to get requests.

kev009 · on Oct 5, 2016

Abject failure IMHO. A CDN must be as transparent as possible for both end users and sites using them. DDOS can and should be dealt with passive detection.

acdha · on Oct 5, 2016

Cloudflare offers configuration. Users may choose to configure paranoid security options which are unsuitable for non-browser usage but that's true of any hosting option.

_csoo · on Oct 5, 2016

The default configuration is the damned problem.

Kalium · on Oct 5, 2016

How do you think it should be different?

ryanlol · on Oct 5, 2016

It's really scary how many of the responses here are "lol the user needs to change the config". I guess these are the same guys that think that mongo should bind on 0.0.0.0 by default, the users can change it.

Maybe have working default configs instead? Users are not going to change the configs unless you make them.

tracker1 · on Oct 5, 2016

Those wouldn't be the same guys that bind mongo to 0.0.0.0 by default... the default config in this instance is more secure... which means it's more annoying in scenarios it isn't configured for...

You have a firewall, and then complain to mongo because you can't access your database from another computer.

ruchit47 · on Oct 6, 2016

May be its bad implementation of cloudflare. Cloudflare doesn't do javascript check if content type is xml in headers unless you explicitly want it to do. RSS feeds and similar URLs should be excluded from security with page rules.

rcarmo · on Oct 5, 2016

Hmmm. I've had an intermittent problem where my feed suddenly regurgitates past items without warning. Wondering if this might have something to do with it somehow...

wineisfine · on Oct 5, 2016

[flagged]

cbg0 · on Oct 5, 2016

More like fud. Their prices are the same: https://www.cloudflare.com/plans/

wineisfine · on Oct 6, 2016

You're wrong, it used to be $5 per domain: http://m.imgur.com/sXonmsy

Good job in the downvoting.

vog · on Oct 5, 2016

Wow, seing that level of incompetence happening at a company like Cloudflare is quite astonishing (i.e. hard to believe) ... and utterly disappointing.

ShakataGaNai · on Oct 5, 2016

Who said it is incompetent? Why is it incompetence? It's possible someone has their cloudflare security turned up to 11. Maybe they have hacker problems and would rather risk a few less clicks than getting completed p0wned.

Also keep in mind that just because it's "RSS" doesn't mean there is a quick and easy way to exclude it from security. On your average Wordpress blog the URL is /somethingradom/feed/. So either Cloudflare assumes every URL of /feed/ should be exempt, or it should read the contents of every page to check for exclusions?

Also do keep in mind that if you're a Cloudflare customer you can easily exclude specific URLS from this type of security scrutiny. So perhaps the blog owner is incompetent? Perhaps this person posting this is on a network with a computer thats infected with a botnet.

Who knows. This "article" is shit.

zzzcpan · on Oct 5, 2016

Here's the thing: most website owners don't know any better, it's all up to their CDN providers, like Cloudflare, to provide adequate features for security preconfigurations. Don't make it a user's responsibility. At the end of the day I'm annoyed by Cloudflare, forcing me to enable javascript, not by any of their customers.

predakanga · on Oct 6, 2016

When the website owner decides to use Cloudflare in front of their site, it is absolutely their responsibility to ensure that it's configured properly. Not Cloudflare's and certainly not the user's.

And for that matter, it's entirely possible that they consider this correct behaviour. They may have a poorly written dynamic RSS feed that doesn't cache for instance, and want it protected.

Moreover, Cloudflare provides plenty of ways for the website owner to avoid this behaviour - the global security level can be adjusted, settings for individual endpoints can be adjusted, even settings for individual IP ranges can be adjusted.

As a user, you have no idea what settings the website owner has selected; it seems rash to blame Cloudflare in that context.

zzzcpan · on Oct 6, 2016

There is no one else to blame. Website owners are just people, most will never be able to understand the service and all of the implications of the configurations it provides. Cloudflare is the only one here able to do something about it, not their customers or users.

I don't even know why this is a discussion. Not blaming customers and users for anything is a common sense.

ryanlol · on Oct 5, 2016

The only reason the CAPTCHAs exist is incompetence, do you see any of cloudflares competitors doing that? No? It's because they don't need to.

cocktailpeanuts · on Oct 5, 2016

I wonder how many people still use RSS to subscribe to sites, even the HN crowd.

Personally I stopped using "traditional" RSS readers years ago, and have stopped using more "modern" RSS readers like Flipboard or Feedly last couple of years, since they're just a subset of what I can find on Twitter.

Most people who own a blog have a Twitter and they share all their posts on their feed anyway, so I just follow them on Twitter and use Twitter as "RSS reader".

Before you open web standards advocates throw rocks at me, I am not an open standards hater either, I have built a couple of RSS related apps as projects in the past and still believe that's the way to go in the long term, but in 2016 I can't really find a reason to still use a RSS reader.

In fact the scenarios like what OP mentioned in the article is exactly why I would rather use Twitter. Why deprive of yourself of an opportunity to read content from someone you like, when you have a totally free option?

pavel_lishin · on Oct 5, 2016

> I just follow them on Twitter and use Twitter as "RSS reader".

This quickly breaks down. I subscribe to 1096 sites according to Newsblur. I would 100% miss some content from those sites if I only followed them on twitter.

And for some of those sites, that's probably fine; a fair amount are just 'entertainment'. But there are updates I would hate to miss, even if they tweeted links to them at 3am or right before a huge tweet-storm from someone else.

That is exactly the problem RSS was designed to solve.

michilehr · on Oct 6, 2016

Or when you were on vacation taking a brake from work and so also from the tech stuff. RSS is great for not missing something. All that interest based sorting for social media made RSS much more important for me.

z3t4 · on Oct 5, 2016

I use RSS to get ad-free high-quality content that I've opted in to. There will be no hard feelings if I do not "RSS" my friends or colleges. And what I read is not logged in some database.

CharlesW · on Oct 5, 2016

> "… in 2016 I can't really find a reason to still use a RSS reader."

Also remember that every podcast app is an RSS reader.

This year, 21% US adults ages 12 or older have listened to a podcast in the past month.

icebraining · on Oct 5, 2016

they're just a subset of what I can find on Twitter.

That's not possible, because you can get your Twitter timeline as an RSS feed.