Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

The entire library of Congress is like 10TB. You don’t need anything near petabytes until you get out of text into rich media.


Common Crawl is petabytes. Anna's Archive is about a petabyte, but it includes PDFs with images.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: