Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Is our dataset still available? I thought it was taken offline.

Where do you go under that link to get it?

E.g. https://the-eye.eu/public/AI/pile/readme.txt says it’s gone (and "old news"? I disagree).



There are still plenty of reliably sources for magnet links to The Pile, e.g. [1]. The DMCA takedowns are just a minor inconvenience.

1: https://web.archive.org/web/20230820001113/https://academict...


Thank you. How’d you dig this one up?


[1] is the first result if I google "the pile torrent". It doesn't link to the torrent because of a DMCA notice, so I just used the wayback machine to retrieve a version from before the date of that notice. Don't tell the publisher.

1: https://academictorrents.com/details/0d366035664fdf51cfbe9f7...


Frustratingly, they scan my comments, so hopefully they won’t bother filing a DMCA for that.

(Seeing "sillysaurusx" appear in print on official court documents was pretty amusing out of context, though.)


Shawn, there is a mildly redacted version available at https://huggingface.co/datasets/monology/pile-uncopyrighted


Thank you.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: