Hacker News new | past | comments | ask | show | jobs | submit login
Preserving The Internet... and Everything Else (codinghorror.com)
81 points by cruise02 on April 2, 2012 | hide | past | favorite | 14 comments



A couple of years ago I proposed a decentralized dead-tree archival network using time capsules and a shared location map. I've made a few capsules already.

http://carlos.bueno.org/2010/09/paper-internet.html

http://www.wired.com/beyond_the_beyond/2010/10/dead-media-be...


I'm curious to see how we are going to navigate this data.

I'd love to see an etherpad type slider where you can see a page evolve.

Or could we have a search of the entire history of the internet? Or search and browse the internet "as it was in 2007"? It's fascinating to think that Google "only" allows you to search a snapshot of the internet as it is now.

On the other hand the death of the ephemeral scares me a bit. It's like watching Captain Flam again as an adult. If you loved it as a kid, heed this warning - don't do it.


Is it really that paramount to keep most of the web around? The obvious knee-jerk answer is that it’s a great resource about our history, but honestly, I would not care if most of the web simply perished the natural way. It’s only seldom that I use the Wayback Machine and if the content I’m looking for would not be there, it would be no big deal for me. Where is the point of storing too much information?


> It’s only seldom that I use the Wayback Machine and if the content I’m looking for would not be there, it would be no big deal for me.

Depends on what sorts of thing you and me look for on the web, I suppose then.

I use the Wayback Machine regularly, and what I need from it is usually quite valuable (to me), if I could not have found it anywhere else.

Only yesterday I was wandering along information about symmetry groups and tilings of the plane, so you come across the Geometry Junkyard[1] ... dead links all over the place as soon as you take more than a few steps! A lot of those are old university home pages, the ones with ~s in them. If the person doesn't work there anymore, they often rot, and the information doesn't always get transferred to the next site or blog. I know I'm guilty of the same, quite a few ancient old sites floating about that I never bothered collecting under one domain. I know the IA has got them though :)

[1] http://www.ics.uci.edu/~eppstein/junkyard/topic.html


You're only looking back maybe 20 years. Imagine this same information 100 years from now. 200 years. 1000 years.


Can I just not find it on the Internet Archive site, or is it really the case that they only have one location? In an earthquake zone?


http://en.wikipedia.org/wiki/Internet_Archive

"To ensure the stability and endurance of the Internet Archive, its collection is mirrored at the Bibliotheca Alexandrina in Egypt."


Although http://www.bibalex.org/isis/frontend/archive/archive_web.asp... says:

"The Internet Archive at the BA includes the web collection of 1996 through 2007. It represents about 1.5 petabytes of data stored on 880 computers."

So it seems to be a partial snapshot not a dynamic mirror.


Welcome to The Library of Alexandria: It has been 2060 years since our last catastrophic loss of data.


It would be interesting, a decentralized distributed backup solution for things as important as this.

Also, way cheaper, and if you have a client do the crawling, the speed would increase several orders of magnitude. The real challenge becomes keeping track of where data is, and how many redundant copies there are. Actually i'm sure its a really interesting statistics problem.


The people behind Wuala seem to have figured out a way to deal with the redundant copies on client hardware.


"With offices located in San Francisco, California, USA, and data centers in San Francisco, Redwood City, and Mountain View, California, USA, the Archive's largest collection is its web archive, "snapshots of the World Wide Web". To ensure the stability and endurance of the Internet Archive, its collection is mirrored at the Bibliotheca Alexandrina in Egypt."


Keeping all the world's knowledge in the library at Alexandria worked so well the last time : /


All I can think about when I read this is "Who archives the archives?" It's hard for me to imagine the amount of infrastructure required to create something like the internet archive, let alone make sure that the archive is sanely backed up (archived if you will).




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: