The CFAA, which states "exceeding authorized access" to a computer system is bot...

pbhjpbhj · on Aug 31, 2016

I know Google have been treated as if it's a sui generis case, but surely Archive.org and many others are scraping and not being shutdown.

It's weird that the comparison between posting a robots.txt and a "no trespassers" sign hasn't been upheld? Or has it.

IMO the tort of copyright hasn't kept up well with tech changes, but transient cache copies are handled in EU laws IIRC.

Similarly with your mention of the CFAA (UK CMA has some similar terms), they're very loosely drafted. Not accounting for the need to communicate the limits of allowed access is silly though; there's a presumption of allowed access online IMO (that isn't mirrored offline) and going against that presumption should require explicit withdrawal of consent.

If I were drafting the law ...

adibalcan · on Aug 31, 2016

What about import.io?

cookiecaper · on Aug 31, 2016

First IANAL. But there are people building businesses on this kind of thing even though it's almost universally not allowed. Google is the most prominent -- the judiciary has decided that a different set of rules applies to Google v. smaller companies, so Google gets away with it. That's because they are massive and were probably able to pay bribes to the judges to get them to agree with them.

Usually if you C&D upon receipt of C&Ds, you won't get sued unless you actually damaged the site. I assume companies like import.io and Scrapinghub adhere to those. I know particularly in the case of Scrapinghub, they won't go behind a login to get something in order to try to avoid heat.

Fundamentally, however, anything that makes scraping a key component of its business model is a high-risk business under current US law. People can and have sued scrapers out of existence, often leaving a trail of screwed customers, laid off employees, and dejected founders owing huge liability judgments.