Wait, but there is an asymmetry. Legitimate user spends at least a dozen seconds...

ChocolateGod · 2025-08-29T15:31:43 1756481503

The scraper unlike a legitimate human can load and analyse parallel websites simultaneously, so really the difference makes no difference to a scraper.

Say a user browses 10 sites, all restricted by Anubis that add 5 second to the load time, that's 50 additional seconds the user is spent waiting. A scraper with enterprise grade server hardware? that's 5 seconds for all 10 sites.

tptacek · 2025-08-29T02:11:30 1756433490

No, I don't think this is accurate. You have to look at both the cost and the benefit. If you're an AI scraper, it's literally just "what does the marginal next token of training data cost me" --- the answer is: the same as the marginal next token of content costs a reader.

Tavis Ormandy went into more detail on the math here, but it's not great!

comex · 2025-08-29T03:07:36 1756436856

I don’t understand what you mean. Training an LLM requires orders of magnitude more tokens than any one human will ever read. Perhaps an AI company can amortize across all their users, but it would still represent a substantial cost. And I’m pretty sure the big AI companies don’t rely on abusive scraping (i.e. ignoring robots.txt), so the companies doing the scraping may not have a lot of users anyway.

tptacek · 2025-08-29T04:02:58 1756440178

Tavis Ormandy's post goes into more detail about why this isn't a substantial cost for AI vendors. For my part: we've seen POWs deployed successfully in cases where:

(1) there's a sharp asymmetry between adversaries and legitimate users (as with password hashes and KDFs, or antiabuse systems where the marginal adversarial request has value ~reciprocal to what a legit users gets, as with brute-forcing IDs)

(2) the POW serves as a kind of synchronization clock in a distributed system (as with blockchains)

What's case (3) here?

lmm · 2025-08-29T03:29:21 1756438161

The next word is worth less to AI scrapers than to human readers - AIs need to read thousands of articles to get as much value as a human gets from one good article. If you make it cost, say, 5c-equivalent to read an article (but without the overhead of micropayments and authorisations), human readers will happily pay that whereas AI scrapers can't afford even 1c-equivalent.

robocat · 2025-08-29T11:28:24 1756466904

They care about whether the rewards exceed the costs; they don't give a shit what the actual cost is.

If it costs them $1000 to grab a web page but they earn $1001 then they will do that again and again to earn that buck.

swiftcoder · 2025-08-29T07:06:03 1756451163

> Legitimate user spends at least a dozen seconds on a page, they don't care about 10ms overhead.

Unfortunately for the user on a low-end phone, the overhead can be several seconds. For the scraper it's only ever 10ms because that's running on a (relatively) powerful server CPU.

rfl890 · 2025-08-29T06:46:54 1756450014

I don't know of any network latency <=1ms over the public internet, so 10ms overhead might be 2x at best.