Preserving The Internet... and Everything Else

aristus · on April 2, 2012

A couple of years ago I proposed a decentralized dead-tree archival network using time capsules and a shared location map. I've made a few capsules already.

http://carlos.bueno.org/2010/09/paper-internet.html

http://www.wired.com/beyond_the_beyond/2010/10/dead-media-be...

benohear · on April 2, 2012

I'm curious to see how we are going to navigate this data.

I'd love to see an etherpad type slider where you can see a page evolve.

Or could we have a search of the entire history of the internet? Or search and browse the internet "as it was in 2007"? It's fascinating to think that Google "only" allows you to search a snapshot of the internet as it is now.

On the other hand the death of the ephemeral scares me a bit. It's like watching Captain Flam again as an adult. If you loved it as a kid, heed this warning - don't do it.

zoul · on April 2, 2012

Is it really that paramount to keep most of the web around? The obvious knee-jerk answer is that it’s a great resource about our history, but honestly, I would not care if most of the web simply perished the natural way. It’s only seldom that I use the Wayback Machine and if the content I’m looking for would not be there, it would be no big deal for me. Where is the point of storing too much information?

tripzilch · on April 2, 2012

> It’s only seldom that I use the Wayback Machine and if the content I’m looking for would not be there, it would be no big deal for me.

Depends on what sorts of thing you and me look for on the web, I suppose then.

I use the Wayback Machine regularly, and what I need from it is usually quite valuable (to me), if I could not have found it anywhere else.

Only yesterday I was wandering along information about symmetry groups and tilings of the plane, so you come across the Geometry Junkyard[1] ... dead links all over the place as soon as you take more than a few steps! A lot of those are old university home pages, the ones with ~s in them. If the person doesn't work there anymore, they often rot, and the information doesn't always get transferred to the next site or blog. I know I'm guilty of the same, quite a few ancient old sites floating about that I never bothered collecting under one domain. I know the IA has got them though :)

[1] http://www.ics.uci.edu/~eppstein/junkyard/topic.html

wvenable · on April 2, 2012

You're only looking back maybe 20 years. Imagine this same information 100 years from now. 200 years. 1000 years.

justincormack · on April 2, 2012

Can I just not find it on the Internet Archive site, or is it really the case that they only have one location? In an earthquake zone?

simonw · on April 2, 2012

http://en.wikipedia.org/wiki/Internet_Archive

"To ensure the stability and endurance of the Internet Archive, its collection is mirrored at the Bibliotheca Alexandrina in Egypt."

justincormack · on April 2, 2012

Although http://www.bibalex.org/isis/frontend/archive/archive_web.asp... says:

"The Internet Archive at the BA includes the web collection of 1996 through 2007. It represents about 1.5 petabytes of data stored on 880 computers."

So it seems to be a partial snapshot not a dynamic mirror.

jameshart · on April 2, 2012

Welcome to The Library of Alexandria: It has been 2060 years since our last catastrophic loss of data.

swalsh · on April 2, 2012

It would be interesting, a decentralized distributed backup solution for things as important as this.

Also, way cheaper, and if you have a client do the crawling, the speed would increase several orders of magnitude. The real challenge becomes keeping track of where data is, and how many redundant copies there are. Actually i'm sure its a really interesting statistics problem.

eru · on April 2, 2012

The people behind Wuala seem to have figured out a way to deal with the redundant copies on client hardware.

vitno · on April 2, 2012

"With offices located in San Francisco, California, USA, and data centers in San Francisco, Redwood City, and Mountain View, California, USA, the Archive's largest collection is its web archive, "snapshots of the World Wide Web". To ensure the stability and endurance of the Internet Archive, its collection is mirrored at the Bibliotheca Alexandrina in Egypt."

itsameta4 · on April 2, 2012

Keeping all the world's knowledge in the library at Alexandria worked so well the last time : /

Wilduck · on April 2, 2012

All I can think about when I read this is "Who archives the archives?" It's hard for me to imagine the amount of infrastructure required to create something like the internet archive, let alone make sure that the archive is sanely backed up (archived if you will).