Hacker News
new
|
past
|
comments
|
ask
|
show
|
jobs
|
submit
login
kortilla
5 months ago
|
parent
|
context
|
favorite
| on:
Improving recommendation systems and search in the...
The entire library of Congress is like 10TB. You don’t need anything near petabytes until you get out of text into rich media.
osmarks
5 months ago
[–]
Common Crawl is petabytes. Anna's Archive is about a petabyte, but it includes PDFs with images.
Guidelines
|
FAQ
|
Lists
|
API
|
Security
|
Legal
|
Apply to YC
|
Contact
Search: