Hi, author here. You're right, of course. Of the 77 210 OK But Gone URLs, 35 had copies in the archive (more had copies than that, but only 35 were from the right year), but 5 (three domains total) had been blocked by a later robots.txt, and 1 had been "excluded" from the Wayback Machine, which I assume is the removal option you mention.
That's 6 redacted URLs of 77, versus 35 good ones. I'm not going to not donate to the Internet Archive and help preserve access to those 35 because of those 6.
EDIT: In the interest of rigor, of the 76 404 Not Found URLs, 4 URLs (three domains) had been blocked by a later robots.txt. 45 had relevant content preserved and accessible. That's 10 redactions total.
Also, when the Internet Archive imports third-party captures, like those from Archive Team, they are included irrespective of the robots.txt from the time of capture, which is then only used to manage display of the content.
It's LOCKSS, lots of copies keeps stuff safe. The Internet Archive isn't the only solution; we need more archives. But it's a start, and getting all the bookmarking services to contribute to it, and to improve and make public their caches would be a great improvement.
That's 6 redacted URLs of 77, versus 35 good ones. I'm not going to not donate to the Internet Archive and help preserve access to those 35 because of those 6.
EDIT: In the interest of rigor, of the 76 404 Not Found URLs, 4 URLs (three domains) had been blocked by a later robots.txt. 45 had relevant content preserved and accessible. That's 10 redactions total.
Also, when the Internet Archive imports third-party captures, like those from Archive Team, they are included irrespective of the robots.txt from the time of capture, which is then only used to manage display of the content.
It's LOCKSS, lots of copies keeps stuff safe. The Internet Archive isn't the only solution; we need more archives. But it's a start, and getting all the bookmarking services to contribute to it, and to improve and make public their caches would be a great improvement.