Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

That's not the same as piracy though. He wasn't downloading millions of scientific papers from libgen or sci-hub, he was downloading them directly from jstor. Indeed, none of his charge was for copyright infringement. It was for stuff like "breaking and entering" and "unauthorized access to a computer network".


The exact same charges could apply to the AI scrapers illegitimately accessing random websites.


No, they couldn't, since the then-novel and untested strained interpretation of the CFAA that the prosecutor was relying on has since been tested in the courts and soundly rejected.


I haven’t seen any accusations that they’ve done that, though. Usually people get pirated material from sources that intentionally share pirated material.


They're not just training on pirated content, they've also scraped literally the entire internet and used that too.


Scraping the public internet is also not a CFAA violation


CFAA bans accessing a protected computer without authorization. Hitting URLs denied by robots.txt has been argued to be just that.


> Hitting URLs denied by robots.txt has been argued to be just that.

"Has been argued" -- sure, but never successfully; in fact, in HiQ v. LinkedIn, the 9th Circuit ruled (twice, both before and on remand again after and applying the Supreme Court ruling in Van Buren v. US) against a cease and desist on top of robots.txt to stop accessing data on a public website constituting "without authorization" under the CFAA.


Now do every other jurisdiction


CFAA was mentioned specifically, which means only US jurisdiction is relevant here.


Part of the accusation comes from the fact that Swartz accessed the downloads through a MIT network closet, which AI companies wasn't doing. The equivalent to that would be if openai broke into a wiring closet at Disneyland to download Disney movies.


The CFAA is vague enough to punish unauthorized access to a computer system. I don't have an example case in mind, but people have gotten in trouble for scraping websites before while ignoring e.g. robots.txt


The CFAA might be vague, but the case law on scraping pretty much has been resolved to "it's pretty much legal except in very limited circumstances". It's regrettable that less resourced defendants were harassed before large corporations were able to secure such rulings, but the rulings that allowed scraping occurred before AI companies' scraping was done, so it's unclear why AI companies in particular should be getting flak here.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: