Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

> They’ve nev­er claimed to in­dex ev­ery word on ev­ery page.

Not in those words, but they do claim to aspire to “Organize the world’s information and make it universally accessible and useful.”[1] which ought to include old web pages. They've gone to the effort of finding out of print books and digitizing them to make those searchable so it doesn't seem like a ten year old web page should be such a stretch.

[1] https://www.google.com/intl/en/about/our-company/



you'd think it would at least come up in the internet archive if not anywhere else.



That's unfortunate. But understandable in a way.

    # robots.txt web.archive.org 2013-10-02

    User-agent: *
    Disallow: /

    User-agent: ia_archiver
    Allow: /


touche, I don't suppose the old non commercial websites mentioned in the article suffer the same problem though right? Maybe an accidental robots.txt file was mistakenly left around?




Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: