Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The CFAA, which states "exceeding authorized access" to a computer system is both a crime and a tort, and the Copyright Act, which has been interpreted to mean that copies of HTML pages, even if they exist only for microseconds in RAM, are subject to copyright and thus, copyright infringement claims can be brought against anyone who downloaded the page.

It's also breach of contract (which I'm labeling separately from "illegal" to avoid nitpickers, even though it could be included) due to the individual ToS on each site, which almost always include boilerplate forbidding the access of the site by "automated means" in addition to forbidding "commercial" or other non-personal use.

Before you raise the common counterarguments, please know that others have done so before you, and the courts have generally sharply disagreed with them. There is no respect for non-Google data scraping in the judiciary.



I know Google have been treated as if it's a sui generis case, but surely Archive.org and many others are scraping and not being shutdown.

It's weird that the comparison between posting a robots.txt and a "no trespassers" sign hasn't been upheld? Or has it.

IMO the tort of copyright hasn't kept up well with tech changes, but transient cache copies are handled in EU laws IIRC.

Similarly with your mention of the CFAA (UK CMA has some similar terms), they're very loosely drafted. Not accounting for the need to communicate the limits of allowed access is silly though; there's a presumption of allowed access online IMO (that isn't mirrored offline) and going against that presumption should require explicit withdrawal of consent.

If I were drafting the law ...


What about import.io?


First IANAL. But there are people building businesses on this kind of thing even though it's almost universally not allowed. Google is the most prominent -- the judiciary has decided that a different set of rules applies to Google v. smaller companies, so Google gets away with it. That's because they are massive and were probably able to pay bribes to the judges to get them to agree with them.

Usually if you C&D upon receipt of C&Ds, you won't get sued unless you actually damaged the site. I assume companies like import.io and Scrapinghub adhere to those. I know particularly in the case of Scrapinghub, they won't go behind a login to get something in order to try to avoid heat.

Fundamentally, however, anything that makes scraping a key component of its business model is a high-risk business under current US law. People can and have sued scrapers out of existence, often leaving a trail of screwed customers, laid off employees, and dejected founders owing huge liability judgments.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: