Just a callout that Fastly provides free bot detection, CDN, and other security services for FOSS projects, and has been for 10+ years https://www.fastly.com/fast-forward (disclaimer, I work for Fastly and help with this program)
Without going into too much detail, this tracks with the trends in inquiries we're getting from new programs and existing members. A few years ago, the requests were almost exclusively related to performance, uptime, implementing OWASP rules in a WAF, or more generic volumetric impact. Now, AI scraping is increasingly something that FOSS orgs come to us for help with.
I've been running into bot detection on at least five different websites in the past two months (not even including captcha walls)
Not sure what to tell you but I surely feel quite human
Three of the pages told me to contact customer support and the other two were a hard and useless block wall. Only from Codeberg did I get a useful response, the other two customer supports were the typical "have you tried clearing your cookies" and restart the router advice — which is counterproductive because cookie tracking is often what lets one pass. Support is not prepared to deal with this, which means I can't shop at the stores that have blocking algorithms erroneously going off. I also don't think any normal person would ever contact support, I only do it to help them realise there's a problem and they're blocking legitimate people from using the internet normally
It's not like they say, but it's at least three different implementations and I don't think any were cloudflare because I've been running into those pages for years and they've got captchas (functional or not). One of them was Akamai I think indeed
Yeah, I definitely don't want to pivot this thread into a product pitch, as the important thing is helping the open-source projects, but we can work with the maintainers to tune the systems to be as strict/lax as preferred. I'm sure the other services can too, to be fair.
The underlying issue is that many sites aren't going to get feedback from the real people they've blocked, so their operators won't actually know that tuning is required (also, the more strict the system, the higher percentage of requests will be marked as bots, which might lead an operator to want things to be even more strict...)
I will say -- a higher-end bot detection service should provide paper trails on the block actions they take (this may not be available for freemium tiers, depending on the vendor).
But to your point, the real kicker is the "many sites aren't going to get feedback from the real people they've blocked" since those tools inherently decided that the traffic was not human. You start getting into Westworld "doesn't look like anything to me" territory.
I'm not into westworld so can't speak to the latter paragraph, but as for "high-end" vendors' paper trail: how do log files help uncover false blocks? Any vendor will be able to look up these request IDs printed on the blocking page, but how does it help?
You don't know if each entry in the log is a real customer until they buy products proportional to some fraction of their page load rate, or real people until they submit useful content or whatever your site is about. Many people just read information without contributing to the site itself and that's okay, too. A list of blocked systems won't help; I run a server myself, I see the legit-looking user agent strings doing hundreds of thousands of requests, crawling past every page in sequence, but if there wasn't this inhuman request pattern and I just saw this user agent and IP address and other metadata among a list of blocked access attempts, I'd have no clue if the ban is legit or not
With these protection services, you can't know how much frustration is hiding in that paper trail, so I'm not blocking anyone from my sites; I'm making the system stand up to crawling. You have to do that regardless for search engines and traffic spikes like from HN
Oh my, a Dutch film that actually sounds good?! I get to watch a movie that's originally in my native language for perhaps the second time in my life, thanks for linking this :D
Edit: and it's on YouTube in full! Was wondering which streaming service I'd have to buy for this niche genre of Dutch sci-fi but that makes life easy: https://www.youtube.com/watch?v=4VrLQXR7mKU
Final update: well, that was certainly special. Favorite moment was 10:26–10:36 ^^. Don't think that comes fully across in the baked-in subtitles in English though. Overall it could have been an episode of Dark Mirror, just shorter. Thanks again for the tip :)
I have to assume the Dutch movie industry just isn't too big.
I guess it's a side effect of America's media, but when I went to Europe including the Netherlands almost everyone spoke English at an almost native level.
It almost felt like playing a video game where there is an immersive mode you can just turn off if it gets too difficult ( subtitles in English at all public facilities).
Without going into too much detail, this tracks with the trends in inquiries we're getting from new programs and existing members. A few years ago, the requests were almost exclusively related to performance, uptime, implementing OWASP rules in a WAF, or more generic volumetric impact. Now, AI scraping is increasingly something that FOSS orgs come to us for help with.