Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Robots.txt is respected retroactively?


Yes, but it is only temporary. As long as you have a robots.txt file excluding some URLs, those URLs will: 1) not be crawled by the Internet Archive crawler, 2) not be shown in the Wayback Machine. Any already-crawled pages will, however, invisibly remain in the archive, and will reappear once they are not in the robots.txt anymore.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: