Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Not if they hop to a different IP address every few requests. And they generally aren't bothered slow responses. It's not like they have to wait for one request to finish before they make another one (especially if they are making requests from thousands of machines).


You're saying that large companies are hitting individual websites with thousands of unrelated IP addresses?


Yep we've been seeing that on our random small scale site that used to be open (and mostly relevant to a very limited number of people).

It was nice for interested guests to get an impression of what we are doing.

First the AI crawlers came in from foreign countries that could be blocked.

Then they beat down the small server by being very distributed, calling from thousands of ips one or two requests each.

We finally put a stop to it by requiring a login with a message informing people to physically show up to gain access.

Worked fine for over 15 years but AI finally killed it.


How do you think they do crawling if not like that? They'd be IP banned instantly if they used any kind of predictable IP regime for more than a few minutes.


I don't know what is actually happening, that's why I'm asking.

Also you're implying that the only way to crawl is to essentially DDOS a website by blasting them from thousands of IP addresses. There is no reason crawlers can't do more sites in parallel and avoid hitting individual sites so hard. There are plenty of crawlers for the last few decades that don't cause problems, these are just stories about the ones that do.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: