> I know because I've done more than my share of web crawling and I have crawlers that: capture plain http data, can run Javascript in a limited way, and can run React apps. The last one would blast right past Anubis without any trouble except for the rate limiting which is not a lot of problem because when I crawl I hit fast, I hit hard, and I crawl once.
I have no problem with bots scraping all my data, I have a problem with poorly-coded bots overloading my server, making it unusable for anybody else. I'm using Anubis on the web interface to an SVN server, so if the bots actually wanted the data, they could just run "svn co" instead of trying to scrape the history pages for 300k files.
> It seems like a whole lot of crap to me. Hostile webcrawlers, not to mention Google, frequently run Javascript these days.
I'm also rather unhappy that I had to deploy Anubis, but it's unfortunately the only thing that seemed to work, and the server load was getting so bad that the alternative was just disabling the SVN web interface altogether.
I have no problem with bots scraping all my data, I have a problem with poorly-coded bots overloading my server, making it unusable for anybody else. I'm using Anubis on the web interface to an SVN server, so if the bots actually wanted the data, they could just run "svn co" instead of trying to scrape the history pages for 300k files.
> It seems like a whole lot of crap to me. Hostile webcrawlers, not to mention Google, frequently run Javascript these days.
I'm also rather unhappy that I had to deploy Anubis, but it's unfortunately the only thing that seemed to work, and the server load was getting so bad that the alternative was just disabling the SVN web interface altogether.