A couple of years ago I proposed a decentralized dead-tree archival network using time capsules and a shared location map. I've made a few capsules already.
I'm curious to see how we are going to navigate this data.
I'd love to see an etherpad type slider where you can see a page evolve.
Or could we have a search of the entire history of the internet? Or search and browse the internet "as it was in 2007"? It's fascinating to think that Google "only" allows you to search a snapshot of the internet as it is now.
On the other hand the death of the ephemeral scares me a bit. It's like watching Captain Flam again as an adult. If you loved it as a kid, heed this warning - don't do it.
Is it really that paramount to keep most of the web around? The obvious knee-jerk answer is that it’s a great resource about our history, but honestly, I would not care if most of the web simply perished the natural way. It’s only seldom that I use the Wayback Machine and if the content I’m looking for would not be there, it would be no big deal for me. Where is the point of storing too much information?
> It’s only seldom that I use the Wayback Machine and if the content I’m looking for would not be there, it would be no big deal for me.
Depends on what sorts of thing you and me look for on the web, I suppose then.
I use the Wayback Machine regularly, and what I need from it is usually quite valuable (to me), if I could not have found it anywhere else.
Only yesterday I was wandering along information about symmetry groups and tilings of the plane, so you come across the Geometry Junkyard[1] ... dead links all over the place as soon as you take more than a few steps! A lot of those are old university home pages, the ones with ~s in them. If the person doesn't work there anymore, they often rot, and the information doesn't always get transferred to the next site or blog. I know I'm guilty of the same, quite a few ancient old sites floating about that I never bothered collecting under one domain. I know the IA has got them though :)
It would be interesting, a decentralized distributed backup solution for things as important as this.
Also, way cheaper, and if you have a client do the crawling, the speed would increase several orders of magnitude. The real challenge becomes keeping track of where data is, and how many redundant copies there are. Actually i'm sure its a really interesting statistics problem.
"With offices located in San Francisco, California, USA, and data centers in San Francisco, Redwood City, and Mountain View, California, USA, the Archive's largest collection is its web archive, "snapshots of the World Wide Web". To ensure the stability and endurance of the Internet Archive, its collection is mirrored at the Bibliotheca Alexandrina in Egypt."
All I can think about when I read this is "Who archives the archives?" It's hard for me to imagine the amount of infrastructure required to create something like the internet archive, let alone make sure that the archive is sanely backed up (archived if you will).
http://carlos.bueno.org/2010/09/paper-internet.html
http://www.wired.com/beyond_the_beyond/2010/10/dead-media-be...