That's not the same as piracy though. He wasn't downloading millions of scientific papers from libgen or sci-hub, he was downloading them directly from jstor. Indeed, none of his charge was for copyright infringement. It was for stuff like "breaking and entering" and "unauthorized access to a computer network".
No, they couldn't, since the then-novel and untested strained interpretation of the CFAA that the prosecutor was relying on has since been tested in the courts and soundly rejected.
I haven’t seen any accusations that they’ve done that, though. Usually people get pirated material from sources that intentionally share pirated material.
> Hitting URLs denied by robots.txt has been argued to be just that.
"Has been argued" -- sure, but never successfully; in fact, in HiQ v. LinkedIn, the 9th Circuit ruled (twice, both before and on remand again after and applying the Supreme Court ruling in Van Buren v. US) against a cease and desist on top of robots.txt to stop accessing data on a public website constituting "without authorization" under the CFAA.
Part of the accusation comes from the fact that Swartz accessed the downloads through a MIT network closet, which AI companies wasn't doing. The equivalent to that would be if openai broke into a wiring closet at Disneyland to download Disney movies.
The CFAA is vague enough to punish unauthorized access to a computer system. I don't have an example case in mind, but people have gotten in trouble for scraping websites before while ignoring e.g. robots.txt
The CFAA might be vague, but the case law on scraping pretty much has been resolved to "it's pretty much legal except in very limited circumstances". It's regrettable that less resourced defendants were harassed before large corporations were able to secure such rulings, but the rulings that allowed scraping occurred before AI companies' scraping was done, so it's unclear why AI companies in particular should be getting flak here.