Hacker News new | past | comments | ask | show | jobs | submit login
Tell HN: Stack Exchange (Stack Overflow) is now blocking Tor
429 points by yanmaani on Feb 8, 2022 | hide | past | favorite | 195 comments
I tried visiting Stack Exchange today using Tor Browser, and apparently my IP is blocked. I tried with several different circuits, and they're all blocked. Several people on IRC confirmed this as well.

I tried to contact them by e-mail and they didn't bother to respond, as is par for the course these days.

The full error message:

---

  Access Denied

  This IP address (185.220.101.37) has been blocked from access to our services. If you believe this to be in error, please contact us at team@stackexchange.com.

  When contacting us, please include the following information in the email:

  Method: block

  XID: 820408900-HHN

  IP: 185.220.101.37

  X-Forwarded-For: 185.220.101.37

  User-Agent: Mozilla/5.0 _Windows NT 10.0; rv:91.0_ Gecko/20100101 Firefox/91.0

  Reason: Blocked.

  Time: Tue, 08 Feb 2022 08:11:50 GMT

  URL: tor.stackexchange.com/

  Browser Location: https://tor.stackexchange.com/



This is the official response from SE:

https://meta.stackexchange.com/questions/376060/update-on-th...

For the past month Stack Overflow has been hit by weekly DDoS attacks that progressively grew in size and scope. In each incident, the attacker(s) have been changing their methodology and responding to our countermeasures. Initially we were able to detect and mitigate the attacks before any performance degradation could be noticed but the latest attacks ramped up very quickly and the site was brought down before we could react.

While we cannot go into specifics on each attack in order to maintain opsec and not tip off the attackers, we can say that each individual attack has been using different IP addresses and targeted different aspects of the site. During an outage, our top priority is always getting the site back up and running. After traffic has been stabilized, we perform a post mortem for the incident where we assess and improve upon the actions we have taken.

During the outage last Sunday, we noticed that a large amount of the DDoS traffic originated from Tor exit nodes. The decision to block Tor exit nodes did not come lightly, in fact Teresa, our CTO was on the call when we discussed remediation methods. Due to the persistent nature of the attack and our desire to bring the site back up as fast as possible we made the decision to block all DDoS traffic endpoints, including these Tor exit nodes.

We did not target, nor set out to block all traffic from Tor, that’s not something Stack has ever done. However, due to the shared nature of Tor exit nodes, some of them were also routing DDoS attacks to our sites and were blocked. We have tried removing these blocks between attacks but this action has resulted in further site outages as DDoS efforts continue to originate from these exit nodes. Unfortunately blocking the Tor exit nodes also blocks legitimate users from using them. An immediate solution for users who find themselves blocked is to access our site from other IP addresses, via home internet, work internet, or other VPN services.

We are continuing to evaluate the situation and will keep our community updated. Thank you for your patience and understanding.


For anyone who is using Tor and needs Stack Overflow knowledge while they're doing such mitigation, Kiwix [1], best known for allowing to read wikipedia offline, also provides dumps of Stack Overflow that you can download and browse locally [2]. It's massive, though (134gb).

[1] https://www.kiwix.org/en/

[2] https://wiki.kiwix.org/wiki/Content_in_all_languages


What is up with the privacy policy and data usage? The app store information says this:

--- App Privacy

The developer, Wikimedia CH, has not provided details about its privacy practices and handling of data to Apple. For more information, see the developer’s privacy policy.

----

... then you click the link for "developer's privacy policy" and it goes to a 404 at http://www.kiwix.org/impressum/

... then you go find the privacy policy on the website at https://www.kiwix.org/en/legal/privacy-policy/ and it says:

"This privacy policy applies to the website www.kiwix.org only. It does not cover our subsites like wiki.kiwix.org or download.kiwix.org."


You use Kiwix offline. You download a Kiwix database of the content you wish for.

If you can get the app and the database on tor, then you're golden.


Hm. But the privacy i was interested in was the privacy during app usage... What data is Wikimedia CH collecting about me and why won't they say?


They could run their own stackoverflow .onion service and advertise it in the site headers [0] (including on the block page).

Then the onion traffic can be handled differently (maybe DDoSing the .onion will only affect Tor users, or the resulting .onion resource usage is throttled). Worst case a readonly version of the site would be better than nothing.

Something similar could be done with the existing tor exit node blocklist, but maybe this is easier to separate from existing infrastructure.

[0] https://community.torproject.org/onion-services/advanced/oni...


Yes they _could_ do a bunch of engineering and infra work to support a niche use case.


At stack overflow scale a "niche" use case could mean more users then your average web site.


(Your sarcasm may have been too subtle for the average reader.)


What do you mean?


Whatever is deciding if the traffic is tor or not can get ddos'd too.


Eh? That's not how hidden services work.


It's pretty trivial to find the list of running tor exit nodes.


Exit nodes are not used to connect to hidden services.


I wonder what's the motive to target StackOverflow? A disgruntled ex-employee?


What an interesting question. When I first got into computing back in the 80's and 90's, people wrote viruses just for the challenge and fun of it. Some people just want to watch the world burn. This was seen as obvious motivation, it was unquestioned. It was only after crypto mining and ransonware became big things in the 2010's, that people started assuming a financial motive behind attacks.

Up through the mid-2010's, people posted amateur porn largely for personal fun and ego boost. Occasionally, regular people would try to make a buck. But it was mostly people posting selfies to /r/gonewild because they enjoyed the attention. Then OnlyFans came out, and now virtually anyone posting amateur adult material can be assumed to be driving traffic to a monetization channel.

Sorry for the tangents. It just strikes me how normalized "side hustle" culture has become. It was not very long ago at all that it wouldn't even occur to most of us to question a DDOS attacker's "motive".


We all have been seeing random stackoverflow derivative sites pop up in search. They are taking the exact questions, description and answer and "bloggerizing" them.

There are dozens of sites like this. I even came across YouTube channels with thousands and thousands videos that created slideshows with elevator music that presented SO question and answers one sentence at a time. You can convert SO posts to blogs fairly easily and they rank.

I am going to guess that some people are trying to scrape the stack overflow data to make their own spinoff blog versions of it.


Stackexchange contents are freely available with a share-alike licence, ctrl+f "user contributions" in the site footer. The data dumps are linked elsewhere in these discussion threads.

They are not scraping. The secondary economic exploitation is expected. The copy sites (attempt to) capture value that the primary is unable to.

Despite Google becoming worse overall, I noticed that the copy sites are consistently ranked lower than the original. What is your search engine experience? I have not encountered a robo Youtube video as you have described.


> I am going to guess that some people are trying to scrape the stack overflow data to make their own spinoff blog versions of it.

An interesting thought, but it doesn't fit with the description provided by SO of an evolving DDoS attack that originates from different networks in response to mitigations. It's not scraping. It's an attempt to deny service.


but once the SO website doesn't work for someone they would be more likely to visit the scraped version


Presumably advertising for the ddos service; "pay us and we'll knock your target offline! Remember, we're the ones who killed stackoverflow, we can help you"


Well, they just got rid of their job board.


Is a coordinated DDoS attack a reasonable response to someone getting rid of a free feature you like?


Might be. Depends on your personality.


Probably trying to ransom if I had to guess.


Probably the owner of one of those stackoverflow clones would see a bump in traffic => more ad revenue if SO is experiencing slower response times/downtime.


Should we be investigating those Expert Sex Change people?


Someone is behind schedule and _really_ needed to focus on his work.


If I were behind schedule I would try to scrape StackOverflow so I don't get cut off from it.

Edit: maybe someone's scraper was too efficient.


This adds up, metrics for Tor show increased bandwidth in recent days https://metrics.torproject.org/bandwidth.html?start=2022-01-...


Also SE has been down a lot yesterday.


Thank you for posting this, there is some tin foil hat stuff going on deeper on the thread, so I hope this stays up top.


Yes, it's a useful response. What happens if they use traffic shaping to limit Tor traffic to a manageable share of their resources? This might ultimately deter the attacker since the end outcome would simply be a variable amount of inconvenience for legitimate Tor users.


This is... interesting. Tor traffic is now high-bandwidth enough to meaningfully participate is DDoS attacks? Or is it more likely that those Tor exit nodes are on machines controlled by the attackers? A host compromise through the Tor agent? Baddies deploying additional Tor agents to better mask their activities?


> An immediate solution for users who find themselves blocked is to access our site from other IP addresses, via home internet, work internet, or other VPN services.

Anyone for whom this is a solution has already tried it.


The # of tor exit nodes relative to IP addresses available to people doing the DDoS is miniscule. I'm so tired of company blocking tor users and always claiming this as the reason. Any IOT device, routers with admin/pass as login, smart cameras, grandma's old PC and a million other types of devices are fully susceptible to being hijacked as a proxy giving the attacker an unimaginable number of IP addresses.


Yet, geo-ip and service provider (AWS) blocks are extremely effective at blocking these kinds of attacks. The issue is there is no list of IPs that can be identified and blocked with any reliability. It is impossible to separate valid connections from invalid. When the vast majority of traffic (99+%) is malicious, slamming the door is the most viable solution.

If there was a good way, Cloudflare would have solved it. Cloudflare hasn’t been enough for my company. We had to add temporary Geo-IP blocks just to get the site up.


Seems to me tor needs to implement a way to blend in better with the rest of internet traffic instead of it being detectable as an exit node so easily.


Tor incoming traffic is easy to detect and block on purpose. It's not trying to hide or 'blend in' via technical means, it's seeking social legitimacy for privacy-enhanced traffic.


Why not "social legitimacy for privacy-enhanced traffic" that is also nobody else's business to know, including the hosting providers?

ie. I don't want Cloudflare CEO popping up here gloating about how they're fighting for the cause of tor users! F' that if they don't know it's there they can't take credit for it either. That's the tor I want.


There was no intentional blocking of Tor there in the past as far as I understand. But there are some mechanisms that block users from IP addresses that posted spam or abusive content. Those are automated and based on flags and not directed towards Tor specifically, but of course TOR addresses are more likely to have been misused that way so your chances of encountering these blocks is dramatically higher when using Tor. But these blocks only apply to low reputation users, I think once you're above 100 they don't matter anymore. And they also don't block reading the sites.

There have been DDoS attacks recently on Stack Overflow and Stack Exchange, I think it's quite plausible those have something to do with this. They still seem to be under active attack intermittently.


It was never "TOR", it's; Tor: https://matt.traudt.xyz/posts/2021-02-22-tor-spelling.

The new blocking completely blocks access to the site, not just posting. See for yourself; https://www.torproject.org/

Tor is next to useless for DDoS attacks, as it doesn't offer any amplification. For every byte you send in via TCP, you get one byte out. For attackers with large botnets, it doesn't make sense to DDOS over Tor, as the number of exiting IP addresses are limited and it's easy to block them all. It makes more sense to use your thousands of available botnet IP's that aren't on any lists.


  > Tor is next to useless for DDoS attacks, as it doesn't offer any amplification.
It seems that these attacks are not being carried out to take down the Stack Exchange network. It seems that these attacks are to take down Tor as a legitimate technology. Here me out.

In order to carry out these attacks, the attacker already controls enough machines to DDoS one of the largest websites on the internet. So, huge botnet or nation state. Now, we have one of the largest websites on the internet telling it's tech audience:

  > An immediate solution for users who find themselves blocked is to access our site
  > from other IP addresses, via home internet, work internet, or other VPN services
https://meta.stackexchange.com/questions/376060/update-on-th...

They are normalizing the "workaround" of using an insecure IP address when Tor is inaccessible. This will lead to all non-secretive and non-illicit Tor usage to go back to the open insecure internet. Thus, everyone still using Tor "has something to hide" (as if that wasn't the case already). By forcing all but the most desperate users off Tor, Tor can be discredited as a nefarious tool.


>They are normalizing the "workaround" of using an insecure IP address when Tor is inaccessible. This will lead to all non-secretive and non-illicit Tor usage to go back to the open insecure internet. Thus, everyone still using Tor "has something to hide" (as if that wasn't the case already). By forcing all but the most desperate users off Tor, Tor can be discredited as a nefarious tool.

I think this is a reach - after all, if you wanted to discredit a tool for people in oppressive regimes, people who care about their privacy, and people doing illegal things, why start with the programming help site? (I know SE has other sites, but the biggest ones are for tech help)

...That said, I could see this being a way LE could try and unmask a very high-value darkweb programmer. Still a reach.


> I think this is a reach - after all, if you wanted to discredit a tool for people in oppressive regimes, people who care about their privacy, and people doing illegal things, why start with the programming help site? (I know SE has other sites, but the biggest ones are for tech help)

My guess is they'd target websites that are important to TOR users, and tor.stackexchange.com is one of them. I also don't think they started with SE. TOR IPs are continually filtered by Google, Cloudflare, and other automated firewalls.


For good reason, though. Bad actors using tor not for privacy but to evade IP bans on websites/games/etc or to shit post or disrupt legitimate communities-- this is the issue IMO. It's easier to simply block all to IP addresses, especially if you just need it to stop and don't have the resources of a larger company. And for larger companies that block in an automated fashion as has been mentioned elsewhere, there is so much stuff coming in via those IP addresses that is shady that they all tend to get blocked or at least added to watchlists by companies like cloudflare.


Their blog is claiming most of the ddos traffic came over tor.


Maybe SO has blocked all the the attacker's available IPs after their previous attacks, and is now resorting to Tor to mask their traffic?


That doesn't explain why you can't read the site. OP is mentioning that they can't access the site at all.


That's why I suspect it's something new and might be related to the DDoS. The main spam and abuse problems are handles by the system I mentioned, which is not the reason here. So this is likely a defense against something else.


You're looking at simple, static pages. Unless something is horribly wrong with their infrastructure, these should take milliseconds to generate, if even that.


>You're looking at simple, static pages.

Stackoverflow pages have many dynamic components like vote counts, reputation points, sidebar related-questions-links, new comments, etc. An excerpt from their blog explains they can't cache the output : https://nickcraver.com/blog/2019/08/06/stack-overflow-how-we...

Even though Stackoverflow's website io access pattern has higher reads than writes, the resultant generated html is still not as static as Wikipedia pages. Even if they're efficiently using cpu to generate the dynamic elements, the cost of egress traffic may also be a factor.

All that said, I don't have any insight into what heuristics they use to block certain ip addresses.


That blogpost is intriguing!

> variants of cache... anonymous, or not? mobile, or not? deflate, gzip, or no compression?

I don't understand why the markup would vary between mobile/desktop, or why they would even consider compression a varying factor to account for in cache. Maybe that's because their backends produce such specific variants that they have a hard time caching in the first place?

If 80% of pages are only requested every two weeks, then it doesn't make sense to cache those. There's still probably lots of stuff you can cache, such user profile/stats and question/answer scores for several minutes. You can also cache many parts of the markup that are not going to change often. I mean it's always even better when you cache the entire markup, but there can still be lots to gain if you cache only small bits that are expensive to acquire/template.

> But the cost of memory to store those strings (most large enough to go directly on the large object heap) is very non-trivial. And the cost of the garbage collector cleaning them up is also non-trivial.

It looks like their cache implementation could (should?) have been based on better foundations. First, i think the cache doesn't have to reside in memory: disk accesses are fast, certainly much faster than running a database query over the network which will need to access several files and cross-reference data with extra latency on top. Then, because if you're gonna store long-lived stuff in memory you should probably use a garbage collector based that's tailored for this usecase, not your language's (.Net) default GC... maybe Redis? Don't get me wrong, i find it pretty cool if some engineers want to develop a homebrew cache, but that sounds like a huge project in itself.

> the cost of egress traffic

I'm not aware of SO tech stack, but i'd be surprised if they have much egress fees. They're a very big site so they probably run off unmetered dedicated servers if not their own hardware on cheap transit. Who knows, they may even have their own AS and peer with other providers in some locations? From a quick request, it looks like stackoverflow.com is served from Fastly AS but i personally don't understand why: i don't remember seeing any heavy content (video/images) on SO so in that kind of situation a CDN would hurt more than help on slow links. Maybe that's because they bundle megabytes of javascript crap? Now that they're blocking tor users i can't check for myself :-)


They do use Redis. They have a page on their tech here: https://stackexchange.com/performance


That makes sense. From what i remember of using redis a few years back, it doesn't do file-based cache, only in-memory. Something like nginx is very good at file caching, though.


OK, that's true, they're not static, but we're still talking about a few database queries which quickly hit indexed columns. If you're running a public website like this, it's not like each page takes several seconds to load.

The egress traffic is also trivial - a page seems to clock in at below 50k. If they're paying $0.02/GB, that's $1 / million requests.

More importantly, if it really were a DoS attack, there's way less obtrusive methods, such as a CAPTCHA or similar verification screen.


They're not static pages, the lists of questions, users and everything else are dynamic. Some of them are obviously heavily cached, but you can't cache everything. It's certainly possible to find more expensive pages if you put in some effort.


Grant that some of them are more expensive. But why can't I visit any page?


It potentially could be related to all of the "knock-off" websites that scrape StackExchange data. Maybe they are going outbound on various Tor nodes and getting the IPs blacklisted as a result of reading thousands of pages too rapidly.


Given my experience with the network quality of Tor, I'd be surprised if scraping was A) efficient to do over Tor and B) that Stack Overflow would even notice it because as I said, the network speed is too slow, so can't add that much traffic compared to the absolutely staggering amount of traffic they get from non-Tor.


Also, what would be the point? It's easier to download a data dump from https://archive.org/details/stackexchange


You would be surprised/disappointed at the amount of abuse the bigger sites have to handle

Things like this https://news.ycombinator.com/item?id=26072025


Interesting, but doesn't fit the context of Stack Exchange blocking Tor. Your example there is regarding a mobile app hotlinking a image of a flower, which seems easy enough to block/fix, while Stack Exchange blocking all Tor users from even reading Stack Overflow doesn't make so much sense.


1 image abused enough, they dig, find the culprit being an app.

Fixes, updating the app, blocking the image, blocking all requests with empty user agents.

------

1 person abused enough SO, they dig, find the culprit being someone using Tor network.

Fixes, identify the user and ask them to stop, block all traffic from Tor.

-------

Do you propose an alternative fix?


I wasn't aware that existed and obviously that would make any scraping utterly useless


Maybe Stackoverflow finally started to aggressively block everything that might "crawl" their data. I'm super tired of all the paraphrased/translated copies in Google results.


The Stackoverflow database is available for download, no need to scrape the site.

https://www.brentozar.com/archive/2015/10/how-to-download-th...


Is that dump going to be maintained under the new regime? While they can't redact existing data bumps due to the license those are covered by, they don't have to keep publishing updates. If that happens then scraping becomes required for the copy-spam sites and such.

Though that is a bit conspiracy-theory-y and I expect blocking Tor is more an issue of a small but very active number of people using it to create accounts to post anonymous spam or abuse. For that, which can be a very real problem, I would suggest a more fine-grained block: stop people from those addresses creating accounts, or posting from relatively new (or recently inactive) accounts, or perhaps somewhere in the middle: prevent connection from Tor addresses posting at all, but still let people use the site. That way you still block the abuse, but have less impact on others accessing via Tor.


We intend to keep publishing data dumps and with the same regularity.


I'm a 70K rep contributor to a Stack Exchange site.

StackExchange is composed of contributions from the Internet at large. I've appreciated over the years how StackExchange took this stewardship very seriously without hoarding it behind paywalls - the ethical thing to do.

One of the reasons I was motivated to contribute is because 10-12 years ago, putting tech questions in Google often led to Experts Exchange, a site that takes contributions from the public and makes you pay to see them. (Really need a StackExchange-type site to take on Pinterest now ...)

So ... no, they should not stop publishing updates. People need to bookmark sites and go directly to them instead of relying on an increasingly broken search engine that's past its heyday.


> I'm a 70K rep contributor to a Stack Exchange site.

Well, if we are whopping them out on the table... 75.1K over the three I've used most, and a few thousand over a couple of others. Been around since the start, even got sent free shirts & bits in the early days for being in the top [what-ever-the-cutoff-was] on SF, SO, & DBA.SE.

> So ... no, they should not stop publishing updates.

I wouldn't mind if they stopped providing those dumps, I was just passing on that others suggest that they could (and by rights, they could, but I doubt they will, as my post (I though) said).

As long as the data stays available (just on the main sites is fine by me) under a CC BY-SA I'll keep contributing. They can't revoke the license for existing content, if they change it for future content (for the avoidance of doubt: I have no reason to believe they have any plans to) or the sites otherwise devolve (like others have in the past with rampant tracking and ads) I'll probably bugger off.


In my opinion this is even worse when searching for GitHub issues on Google. I prefer searching for issues on Google but since last year there are so many results of websites that show nothing else than crawled GitHub issues. No idea why they are higher ranked than the original GitHub issues. These results don't link to the original issue on GitHub, are often outdated, have bad UX (e.g. the code snippets), and don't allow participating in the discussion.


No idea why they are higher ranked than the original GitHub issues.

Not to be too cynical about it, but who serves the ads on those sites?

If it's Google Ads, then I think you have your answer as to why they are higher ranked.


My theory is github is getting down ranked due to performance. When you view an issue on github the page gets built from mysql queries and can be slow for large issues. gitmemory and the others just serve the content they scraped, so they load much faster.


Seems like they would ignore that "performance" parameter for major information sites like github/reddit/stack/etc


Genuinely curious to know now..


Or it could be because Google dislikes MS owning so much developer mindspace.


You can use site: or inurl: (or a combination) to constrain the search to GitHub.


Often just putting the site name (e.g. "github.com") at the end of the query is enough to do that for me.


I don't understand how these crawlers even work. Last week I published a PoC for CVE-2022-0316. The next day you Google that CVE, and a whole bunch of foreign language sites with an exact copy of my blog would show up. My own original content wasn't on the first two pages. I hadn't linked it anywhere at all and Google Analytics shows literally 0 viewers. Yet they found it, scraped it, and had winning SEO. Seems fixed today.


> ...Google Analytics shows literally 0 viewers.

This at least makes sense, why run the JavaScript in your scraper? It's easy to identify. What do your webserver access logs say?


Oh I get that bit, I just meant that was surprised they were able to find it basically immediately. It's not like my blog is high traffic.


That's pretty crazy. I think Google is currently losing vs these SEO exploiters, at least for some domains such as programming. The fact things switched within a week suggests they are trying to finetune their algorithm to make it less exploitable again.


Link farming, probably. If you have a network of scraper sites already, you inject links to the posts into other posts. That's one of the motivations behind comment spam as well: it builds links pointing to a site, whether people look at the spam or not.

I don't know how much Ye Olde PageRank still factors into Google results, but I'm assuming it's still a significant factor.


Your page got "fixed" because probably got some backlinks.


The problem is with Google not with StackOverflow letting people read their stuff. A search engine which does not prioritize pages with advertisements and broken Google-powered translations (yes the french version of those SO/GH sites are hilarious) will filter that crap out.

I mean anyone who'd like to crawl/scrap a website can do it from home, with the help of friends, or by renting "resential VPN" services for a few bucks a month. Blocking Tor does not prevent scrapping.


You will be shocked how many paraphrasing enthusiasts will not finish their implementations, if the first steps require additional expertise / investment.

Google should definitely resolve this problem on their side, but language models got huge and complex. It's hard to tell (especially with their scale) if a website if composed from paraphrased content of 20 other websites.

I'm kinda shocked how successful those "doorway pages" are at reaching the top and how long they live there.


PSA: Anyone who needs a large amount of Stack Overflow data for whatever (legal) reason, they share dumps at https://archive.org/details/stackexchange – and perhaps elsewhere.


The license is CC-BY-SA 4.0, so the sites that host copies of their data verbatim are (arguably) legal.


Yup! Mostly legal. We maintain a legal guidance page[1] covering what you can and can't do with the data. I assume our lawyers have been involved, but what do I know?

[1]: https://stackoverflow.com/legal/trademark-guidance


It depends. The "attribution" requirements can be a bit onerous at times, in ways that might put the most casual and opportunistic reusers of even CC-BY-SA data at some legal risk. (It's of course easier for a legitimate reuser with an actual stake in the data to dot the i's and cross the t's.)


There is a ublock origin list which you can import to filter search results:

https://github.com/quenhus/uBlock-Origin-dev-filter


This needs its own blog/HN entry :)


Thank you.

There are 322 SO copycats alone.


More like 150 but yeah. SO copycat list is taken from https://github.com/arosh/ublacklist-stackoverflow-translatio...


It's a shame Google doesn't import that list.


> their data

The "data" is under creative commons. It does not belong to them.


Ironically, you can probably reach the copies via Tor and get the information you want there instead.


Which don't even link to original contents to look like their content is actually original... Do something about it google please!


I feel this one.


> I tried to contact them by e-mail and they didn't bother to respond, as is par for the course these days.

Given you only contacted them today presumably, I'd allow them at least a few days to respond...


In my experience, when it comes to big players, responses are either bimodal - either they respond within a few hours, or not at all.

That said, if they respond, I'll edit the post.


As someone who works close to the SO office. No. I’d expect everyone to be home asleep at that time.


Stack Overflow is far from """huge"""

Also, dude, it was the end of the work done or close to midnight in most of the western world when you made this post.


This isn't necessarily deliberate.

Example: years ago at Google we had proxies for Web traffic. Around lunch time you could hit an issue with HN where the traffic was blocked for too many visits from the same IP. There were thousands of engineers and they were concentrated on 1-2 proxy IPs.

A lot of systems tackle potential abuse (eg brute forcing passwords, DoS attacks) with a token and bucket algorithm. Example: a given source IP might get 20 tokens. Each visit uses a token and those tokens are regenerated 1 per 2 seconds.

So with a Tor exit node you may just be hitting abuse protection and that could happen on the CDN, gateway servers, internal servers and so on.


Also, as a user the solution is to use high quality VPN, not Tor or any free VPN that are ripe for abuse.


My tiny side project gets a lot of bot traffic. And just yesterday, I had a large spike from TOR including the IP range 185.220.101.0/24. I was also considering blocking these IP addresses. As the project relies on third-party APIs which are rate-limited, the bots may cause larger loading times for real users.

I'd assume that Stack Exchange might also get lots of bot traffic from TOR.


It was never "TOR", it's; Tor: https://matt.traudt.xyz/posts/2021-02-22-tor-spelling.

The DNS checking service is rate limited and adds latency, but there is a bulk exit list available here; https://check.torproject.org/torbulkexitlist (please don't use 3rd party blocklists, as those often contain middles and guards, even though traffic cannot leave those kind of relays).

But, before you block Tor, please consider that doing so will most likely block a number of legitimate users who you haven't noticed before - just to stop one jerk. Instead, you could restrict access to certain parts of your site for Tor visitors for example.

Bot traffic from Tor is usually of the easily detectable variety (as the attacker didn't have enough skill to build/acquire a botnet and get "clean" IP's),


Thank you for correcting the spelling.


A middle-ground might be just putting some kind of CAPTCHA or similar on them, if they're causing actual trouble.


Or a regular old rate limiter - who's loading a new page more often than a couple of seconds anyway?


It's spelled Tor!


It's a general anti-abuse measure.

It was more than ten years ago but I was seriously spamming one of those reddit-like sites for software developers. I made about 1000 fake accounts with what looked like real names and photographs but if somebody looked close they might have noticed I made no effort to match the gender of the name to the gender of the photograph.

The accounts generated a good amount of "cover traffic" voting for articles that weren't mine but I would give one of my articles somewhere between 20 to 80 votes that would usually recruit an even larger number of real votes. Once in a while somebody would say something like "That's a pretty good blog but not that good, I can't see how he's possibly getting so many votes for every single post"

My take was that what I was doing was pretty safe because my fake users were actually pretty well behaved and there wasn't going to be any real perception that there was a problem. Even so I was using Tor to make sure that my requests were not coming from the same IP address and it wouldn't be so obvious what I was up to.

I lost the database with the fake accounts in a hard drive crash and I wasn't about to push the issue because I had also attracted some attention from the F.B.I. which knew I (or at least my alter ego) was making trouble with Tor so they tried to entrap me into distributing child porn over Tor. I told them no of course ("the FBI in my country sees that as an important enforcement priority") but that did make me lose interest in Tor.

Back then Wikipedia would require you to log in to make any edits over Tor because they didn't want to deal with that kind of BS either. It's not unusual for many kinds of sites to have restrictions on Tor users for the same reason.

My last covert web crawling project involved making a machine image that could be deployed in any AWS zone to do about 200 requests and then pop up in some other place. On one level it looks like you are getting hit from Japan and Singapore and Germany, and... But on another level it is all AWS.

At some point they looked at their logs and freaked out and made changes to make it fundamentally harder to crawl out their site. We've got the running gag today that it's not safe for me to go to Delaware now. But practically a lot of sites block AWS addresses for the same reason they block Tor. It's just way too easy for a third rate hacker wannabee to seem to be coming from too many IP address that way.


This is regrettable. I'm not going to even browse, much less plan to answer questions on Stack Exchange sites if this requires broadcasting privacy-sensitive personal information to them such as an IP address. (I'm ofc. OK with answering from my household or job IP address as a basic matter of accountability and anti-abuse, but surely not with browsing from there!) It's fine for Stack Exchange to want to manage Tor Browser use of their network, but then they should at least publish an .onion alternative in the HTTP response headers, so that Tor Browser users have a tolerable alternative to hitting their clearnet sites.


Would a VPN be sufficient for your needs? I assume you will say that VPN is too insecure. Zero shill here: I heard the VPN offered by Mozilla recently received a security audit.


This is another reason why my interest shifted to I2P over Tor.

Tor is designed more as a proxy to the cleaner, but I2P is focused on internal services, this it would take extra steps to use it for DOS attacks against plain ok websites. If you want to be available to I2P users, you basically opt-in to the network, as opposed to Tor's effectively opt-out approach. I2Ps design is supposedly better at dealing with DDOS attacks, but I don't exactly know the specifics of how that is.

Overall, I think Tor will decline in usage over the next decade because it's too easy to use it nefariously and more institutions will find ways to block it.


> more institutions will find ways to block it

You could detect and block Tor traffic by blocking the exit node IPs. They provide an API that lets you get all exit node IPs so that you could block them: https://check.torproject.org/torbulkexitlist

They are putting no effort in making this hard for site admins.


There are onion sites in Tor. And no scalability issues of i2p (mentioned in their FAQ).


Onion sites are pretty secondary. Very few people using Tor know how to set them up or are even acutely aware of their existence. In I2P there is no concept of an exit node; that would only be unofficially possible by creating an eepsite that forwards connections from the clearnet. This is why I2P is overall a better model because it is a totally opt-in network but with no formal exit node distinction. I haven't read about I2Ps scaling issues in a while, but I'm pretty sure that the I2P network design is better at handling DDOS.


Tor provides anonymity without accountability; having some doors closed to all Tor users is the price paid for that anonymity. If that price isn’t acceptable, either modify Tor to allow more granular accountability in some privacy-protecting way, or don’t use Tor when accessing services that are closed to it.


I can use more or less all other websites with Tor fine, though. It's only Stack Overflow that insists on this nonsense. On most other sites, including this one, I can even make an account and post.

Is their site really so sensitive as to make reading with Tor impossible?


If "fine" means "answering a new captcha every time you blink" then yeah, every other website works fine on Tor.


In my experience, this has gotten better. That was CloudFlare, and they've stopped now. I can't think of a single site that requires CAPTCHAs for Tor users other than archive.is, actually.


Almost all torrent sites. Generally, "data" sites which people love to scrape and think that Tor is a good tool for that


archive.is wastes your time on captchas even if you're not using tor


archive.is requires captchas for iCloud Relay users as well as blocking Cloudflare DNS users, so I wouldn’t consider them to be a Tor-specific example.


Good question. They might be trying to fight SPAM or else. Could just block posting, commenting and up/down voting...


Please define "fine", a majority of websites and services either outright block Tor or severely limit the traffic.


Tor generally doesn't work on Google services.


Is this actually true? I've often used Tor to access Google Docs and Google Maps and to my knowledge have never had a problem. In fact, I'm not even presented with Captchas.


YMMV. Google Search might as well be blocked completely, I guess they don't want to deal with all the SEO-targeting search queries that would otherwise come from Tor.


I guess they could just not generate any data from your searches, but I guess that defeats the purpose, now doesn't it? :)


> Is their site really so sensitive as to make reading with Tor impossible?

Very much so, strange if people here don't understand that, even at the best of times APT's could be discovered down the track by their SO queries, now compare today, with heightened tensions and certain nuclear armed superpowers talking about going to war with each other.

How is this even slightly surprising? SO is vital shit, if you don't agree feel free to null route the site the next time you have a major incident at work :)


There are public dumps of all the questions and answers; you'd imagine that anyone paranoid enough simply runs a local mirror.


Are you actually suggesting that SO/SE are blocking Tor because they intend to track all of their users by their IP addresses (or browser metadata), using national security as a justification?

I still do not understand how blocking Tor helps here. People who are concerned about their security will either use mirror sites, or use data dumps such as what is available at archive.org, or simply not use the SO/SE content at all. The number of users who will abandon Tor and the protection it provides for the express purpose of visiting SO/SE is negligible.

This move will not increase the number of persons who see SO/SE adverts or who are trackable by SO/SE. It will also not decrease the number of persons who will be able to access SO/SE content. So I continue to be mystified about the rationale behind this policy change.


If by 'accountability' you mean the ability for site ops to unilaterally de-anonymise Tor users, then no; Tor users will never agree to that.

If SE executives are really concerned about spam and vandalism by anonymous actors, then SE could Tor users to post assets in escrow (e.g. Monero) before posting. Similarly, if SE executives are concerned about denial-of-service attacks, then SE could rate-limit the sites that are causing the attacks; Tor is not efficient for that kind of attack anyway. There is no sound argument that blocking Tor entirely would further the interests of SE users.

This is the act of a monopolist in secular decline.


Tor is used for shady practices, just like proxies of old. SE has a lot of measures in place already to prevent shady practices. If 90% of traffic from Tor exit nodes is shady, why shouldn't they block Tor entirely?

If you access any website through Tor (or proxies) you're already more suspicious than the average user. If enough people cause trouble through Tor exit node IPs, it's only natural they get blocked.


Actually, there is no evidence that Tor is any shadier than the rest of the Internet, especially given that most attacks and vandalism originate from botnets and other compromised systems, not Tor.

Akamai published an analysis that affirms this:

https://web.archive.org/web/20170317110115/https://www.akama...


Great resource, a surprisingly clear and detailed introduction to the various attacks faced by websites!

The relevant part:

> "we concluded that approximately 1 in 380 http requests coming out of Tor is verified to be malicious, while only 1 in 11,500 http requests coming out of a non-Tor ip were verified to be malicious. In essence, an http request from a Tor ip is 30 times more likely to be a malicious attack than one that comes from a non-Tor ip."


Because 'shady' traffic does you more or less no harm, unless your web application executes arbitrary untrusted input?

For a serious site, the cost of allowing passive reading is going to be ~0.


Pure propaganda. I used to work at a top five web site, Tor never caused us any problems, our problems were 1) hacked university accounts from eastern Europe 2) China 3) Russia 4) super-fans trying to download every video and picture of their favorite porn star at once.

We occasionally had people upload child porn, they did it over the public internet and not tor, our lead counsel was a former US district attorney, his hobby was doxing the uploaders and providing all the evidence and information to the authorities in a "ready to prosecute" package. I forget the exact number but I think he got almost a dozen people prosecuted and jailed.


I think it’s more like Tor traffic is 99.9999% less profitable.

“Legit” Users likely block ads, are unlikely to enter their credit card numbers because of MITM shenanigans and it’s one of the few browsers that takes non-fingerprintability of its users seriously.


If by 'accountability' you mean the ability for site ops to unilaterally de-anonymise Tor users, then no; Tor users will never agree to that.

If SE executives are really concerned about spam and vandalism by anonymous actors, then SE could Tor users to post assets in escrow (e.g. Monero) before posting. Similarly, if SE executives are concerned about denial-of-service attacks, then SE could rate-limit the sites that are causing the attacks; Tor is not efficient for that kind of attack anyway. There is no sound argument that blocking Tor entirely would further the interests of SE users.

A site looking to grow its influence would be more concerned with attracting new users than repelling them. This is the act of a monopolist in secular decline.


Not deanonymise, no.

Come up with a way for siteops to block someone and all their sockpuppet accounts, without knowing the underlying identity, and you’ll become a billionaire.

Without that, the only option we have today is deanonymization, which is a terrible option. We ought to do better.


It's really just a matter of balancing the difficulty of creating a new "identity", so that legitimate users can occasionally use multiple identities to partition their traffic and make it harder to get doxxed, but are still deterred from creating identities cheaply to engage in sybil attacks or escape blocks. There are various ways of committing real-world resources to an identity to deter such abuse. Actual meatspace identity is of course one way of doing this, but there are probably others.


That’s all plausible sounding, but no one’s connected the dots and done it yet. We’re still at “step 2: ???”.


What you are assuming is possible is a logical contradiction. To be able to recognise two persons as being the same is in fact the definition of de-anonymisation. Please check your math.


It is not a contradiction. See https://en.wikipedia.org/wiki/Zero-knowledge_proof as a somewhat similar example.


I'm not restricting my considerations to "the technology that is implemented and available to Tor users today", given that what's available neither meets the needs of sites, nor the needs of users. If you think that the idea is inappropriate, please state so and make your case for why you believe in your viewpoint. If you think that the idea is impossible, please note why you believe that — and then consider the idea as if it were possible.


This is not the spam or vandalism counter-measure, those work differently and only block posting to the sites. And you can avoid them if you establish a reputable account, they won't affect you anymore then.


If it is not to block spam, or vandalism, or denial-of-service attacks, then what is the purpose of this new policy?

To your other point, how can one establish a 'reputable account' if it is not even possible to access the site in the first instance.


Reputation is a feature of identity. Tor users are, by definition and intent, unidentifiable. Opting out of identification naturally opts one out of reputation, as is the case in reality as well — for example, Anonymous using Guy Fawkes masks to prevent reputation from being associated with their citizenship identity.

I don’t understand why Tor users would be interested in reputation at all, given the implicit identification it requires.


I could understand Stack Overflow blocking access to register/signup for Tor users to avoid abuse, as sometimes spammers use Tor, so that kind of makes sense.

But not allowing Tor users to not even read the website? What's the justification for that? You couldn't even perform DDOS over Tor as the network speed is too slow, so Tor activity can't even be a blimp in terms of usage activity for logged out users, so what's the deal here?


It might be interesting to have some sort of cryptographic identity that costs money to generate, and then gets associated with tor in some anonymity-safe way. But the identity would would need to somehow remain unidentifiable, so maybe that's not an identity after all. Tor has enough trouble with maintaining anonymity as is, with the lack of trust in the various routers because government runs many nodes in an attempt to identify tor users.


There's no such thing as "granular accountability". It's binary...there's no in-between.


Of course there is. It depends on who might know you're on a certain website for example. My wife vs my best bud for example. One would hold me a lot more to account for some websites for example whereas my buddy couldn't care less.


Is avoiding accountability the whole point of anonymity?


For me anonymity can be a part of my privacy posture.


Off topic: does anyone understand why SO is shutting down their Jobs Board?

I get they were acquired, but why would you want to shutdown a revenue generating business has been able to uniquely capture the attention of a highly skilled worker who's difficult to hire.


Is there a good alternative site built on the Stack Exchange data dumps? Open-source projects like the Stack Exchange knowledge base and Wikipedia are always at risk of suffering a hostile takeover by their hosting services, whatever the license may say, as recently happened in the spectacular case of Freenode, and happened in a less severe fashion earlier with Sourceforge injecting malware into downloads.

Things like Deletionpedia and Wikiwand provide an essential service by reducing the cost of switching and thus the power of hosting services like Wikimedia, but I don't know of anything comparable for Stack Exchange.


There's probably been an issue with spam originating from that IP.


That makes sense for posting, where Tor users often get a CAPTCHA etc, but I can't even read it. If I go to any site on SE, I'm hard blocked, I can't even fill in a captcha to bypass it.


Loading CAPTCHAs is still very expensive, and is not an answer to large-scale DDoS attacks. Those have to be handled higher up the stack, usually at the proxy/load balancer level.


I tried a new tor a few times, they all got blocked.


I've tried to create new circuits, changing IP addresses, and they are all blocked.


That’s a very convenient excuse, if your goal is to collect more data sell to advertisers and anonymous users annoy you. What next? There has been a lot of spam from people wearing clothes, so everybody needs to be strip searched?


that's pretty much how safety regulations at airports work. Stackoverflow perhaps hired a prior airport consultant for security.


The solution is to not depend on stackexchange. Since the data of stackexchange is released under the creative commons license, third party can host it on onion site. The downside to this approach is real time access and posting questions will not be possible but at least one can browse the older posts.


@dang? Can someone please fix the title? Something like "I have been unable to connect to Stack Overflow using Tor today" would be appropriate. The title at the moment is rash speculation. One doesn't expect HN to be perfect, but one expects more conscientiousness than this.


I disagree. I just opened Tor myself to check, and indeed every circuit (I've refreshed ~10 times using cmd+shift+L) results in an error page that literally states: "Reason: Blocked."


It's beside the point.

What if you had connected? What if an endpoint with a different IP can connect?

The standard here ought to be that the title accurately reflect the content. This post does not.

I don't take HN as the kind of place where people want to read speculation masquerading as fact.


The core of your argument is that you don't think many reloads, by different users, from different Tor IP's, is convincing evidence, and I think that makes you part of a very tiny minority.


I had the same result, 8 connects from "new identity" feature, all ips blocked. It may be temporary though, but today, for sure they are blocking tor.


I don't know StackOverflow's Tor practices. Given the post's title, I was disappointed to finish reading the post and still not know. If I had read the same post with a title like "Ask HN: is Stack Exchange (Stack Overflow) now blocking Tor?" I would have no complaint.


It probably does, sadly.


I ran into this yesterday when testing a Tor middlebox. I didn't think much of it at the time because you slam into a lot of these "no" pages when browsing via Tor. I did think it was a shame that they didn't even give you the "10 CAPCHAs in a row" option that Google will hit you with if you really really need to use the service.


I'd say titles with "Tell HN" in general should trigger an alert. A good deal of it is speculative at best, slander at worst.


[flagged]


I like how anything can be blamed on China. There are many plausible alternative explanation down below, but nope, HNers gotta upvote to the top that one comment that appeals to anti-China (or Chinese?) rhetoric.


Even more ironic as Prosus is Dutch. Tencent isn't involved and it's still being blamed by association. XD

(FD: I am actually a worker of a games company owned by Tencent in Europe)


I don't think it was mentioned as a nationalistic anti-China comment per se. It's just pointing out that the Chinese are often criticized for restricting the freedom of their Internet users, and blocking a major privacy-enhancing browser as Stack Exchange is now doing is arguably just as bad. Users should never be exposed to harassment by network operators or dubious AI "intrusion-spotting" bots for something with such a negligible abuse potential as browsing a 'questions and answers' site.


It is kind of a fun game to go to any arbitrary comments on here and Ctl-F 'china' and more often than not there is some digression about it, sometimes in the most farfetched ways.


I am not familiar with Prosus but wikipedia says it's Dutch, not Chinese.


Prosus is owned by Naspers (RSA) which owns quite a lot of Tencent, so there is a Chinese connexion, but I’m not sure about Chinese control. I haven’t read much of News24 (owned by Naspers) but they didn’t seem to be that different coverage-wise from the rest of the SA media on the spy scandal—after they broke the story—although, good lord, Xiaomei Havard must have been the world’s worst spy if the story was right, as Rebecca Davis pointed out: https://www.dailymaverick.co.za/article/2021-09-01-how-serio....


Yes, but that Chinese connection is in the other direction; Naspers owns them, not the other way around.


Indeed. I suppose that there is a blackmail risk: their stake could be made worthless unless Naspers has Prosus have the Stack Exchange people ban Tor, but I rather doubt this happened.


How could Tencent selectively make the shares that Naspers owns worthless?


One could consider what's now happening with Alibaba ADRs and the fear of delisting. The price on a certain exchange crumbles. However, I do wonder if at the scale of a big company owning big chunks it'd just be a minor cost of doing business to reregister for the HK exchange?


The proper word would be extortion - the Chinese government could in theory make demands of Naspers by threatening to hobble Tencent. It's pretty far-fetched, especially since Naspers owns a minority (28.9%) stake in Tencent.


I mean, Tencent could just be forced by new Chinese law to buy back Naspers stock for $0.001/share. What's Naspers going to do, sue in a Chinese court? Go to war?


Something tells me a shareholder suit by a foreign company in Chinese court, in connection with not blocking internet traffic, would not go down well for Naspers :D


Indeed it's Dutch and is also a subsidiary of the South American company Naspers.


I did not realize that was who purchased SO. Wow.


Are you saying that the DoS attack coming from Tor exit nodes didn't happen; that it was a politically-motivated, fabricated pretext for blocking?


That comment was posted prior to the official response from SE.


It’s true. The Chinese block everything. The other day we had a network outage and AT&T said they were working on getting things back up. Then I noticed that AT&T has a joint venture with China Telecom. Many of AT&T’s products are actually made in China. It blew my mind. The Chinese are everywhere. I wouldn’t be surprised if they are modifying my contact lenses to replace every “Made in California” with a “Made in China” as some kind of propaganda.

You have to be careful. Your friend could be Chinese. You never know.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: