Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Google's handling of these critical archives they were given is pretty abhorrent. The usenet archives should really be made public since there is no business value to them and they don't care about usenet.


When Google started, there was maybe an overall altruistic, visionary, principled culture among many pre-Web Internet-y people, and it looked like Google was of that same school of thought.

(This was at the same time that there was a gold rush of IPO plays, hiring anyone who could spell "HTML", and plopping them down in slick office space, Aerons for everyone, and lavish launch parties, with tons of oblivious posturing and self-congratulating. But Google stood out as looking technically smart, at least I believed the "Don't Be Evil", since that was the OG culture, and it seemed a savvy reference to behaviors in industry and awareness of the power that it was clear they would probably have.)

That might be why it wasn't surprising to hear of things like someone entrusting a bunch of old university backup tapes to Google's stewardship.

This has played out with mixed results, and I think Google could be doing much better for humanity and for techie culture.


Google didn’t kill Usenet; it was already pretty much dead. Web forums had all but taken their place (and where are their archives now? So much is lost).

If you look at the history, Google basically rescued the data from a collapsing Deja News, and made it available again. A nice gesture, which didn’t serve to benefit Google much in the long term.

If we want to preserve history then we can’t rely on for-profit companies. We need to instead fund non-profits whose specific charter is archival and preservation, like the Internet Archive.


> The usenet archives should really be made public

Given the nature of Usenet, they were if anyone wanted them.


Various people sent their old tape reels and other backups to Deja News, which compiled everything. But Deja News never made freely available the individual archives or the collection, nor did Google. The oldest stuff is locked away by Google because the only hard copy was destroyed when sent to Deja News. As time wore on most of the remaining fragments that at one point could have been recompiled independently also disappeared.

What Google is doing by refusing to publish the archive or even share it with parties like the Internet Archive is completely unjustifiable and anathema to everything they once stood for.


> What Google is doing by refusing to publish the archive or even share it with parties like the Internet Archive is completely unjustifiable and anathema to everything they once stood for.

Couldn't a copyright claim (or something under the GDPR or UK's DPA) be used to regain access to those though?

Just because something is published to a public forum doesn't mean you relinquish your rights.


Copyright is a legal mechanism for restricting others from making copies, not for demanding they make copies for you. Off hand I'm unaware of any general legal mechanism to accomplish that outside of a contract or promise.


That’s why I suggested the DPA which does allow for rightsholders to request copies of data pertinent to themselves - I’d argue that usenet postings would fall under that scope.


Doesn't that just create an incentive to destroy the archive before GDPR authorities can shake them down over it?


Perhaps - but it also creates an incentive for companies to destroy inappropriately-held and collected personal data they have no business possessing.

The DPA isn’t new - it was created in 1988 - and UK ISPs had Usenet/NNTP servers long after that.


Google acquired probably the biggest searchable archive, Deja News. What we needed was some kind of self-sustaining org with a strict charter to preserve the archive no matter what.


Archive.org ?


Maybe, though making themselves a target of book publishers may have risked their other responsibilities.


They were until they were not.


> they don't care about usenet.

They cared enough about to kill it.


Controversial question: Why should we preserve code that no one uses anymore? Why should we not allow some information to be simply lost?


Because it's a cultural artifact, of its time. It's history. And some people would like to be able to read it, or do other things with it.

Personally I'd like to be able to link to my own posts from that time, for when people asked me what I used to do. But I can't find them any more.

These groups are mostly not code. They are conversations, design discussions, ideological discussions, jokes, that sort of thing.

Like what we have now in social media, except back then there was pretty much only Usenet, and it had a very different feel than the current social networks.

They are where things ideas like the smiley, and free and open source software, and utopian ideas of internet culture were developed. All the early internet memes. And of course all the knowledge people shared.

Conducted in public at the time and thought to be archived for the long term.


Wonder what people will think in a hundred years when they read that everyone believed the universe was made up almost entirely of invisible and intangible matter? It'll be some future generation's flat earth joke.


This past Sunday's New York Times noted that until the 1860's, almost all reputable scientists insisted that pandas were a myth.


As someone else pointed out, losing information is bad because we can't know what value it might have in the future, only what value it has to us today. A lot of things from the past that we are certain had no value to people at the time (such as literal garbage heaps) are of immense value to historians today in understanding the past and the context within which those "worthless" things existed.

You're right though that a decision will probably have to be made at some point about what to keep and what to toss (how big is YouTube, exactly? Are we really going to keep every video, in its original resolution, forever?), but this is just plaintext, it takes up almost no space. The decision doesn't even have to be made, since it's easy to find the means to store this, so why bother making it? Kicking the can down the road is actually the best decision in this case, since the people of the future will (hopefully) have a clearer understanding about what was important in our own past than we do currently.


Why should we preserve old websites that no one uses? Why bother with historical documentation at all?

It's because, at the time, you don't know what information is going to be important and what is just garbage. Documents that are apparently useless today could become fascinating tomorrow.


No, it's a reasonable question. We're not going to preserve, certainly not in a findable way, every piece of digital flotsam that has ever been summoned into existence. In general, we probably should save what we can of Usenet for historical value as balanced against the fact that the archives are tiny in the scheme of things. They're probably also messy but that's probably OK.

Interestingly, when some people saved a great deal of the Usenet archives pre-Deja News, one of them said something to the effect of they wished they had prioritized saving social discussions and so forth because, by and large, saving discussions about a bug in a long ago version of SunOS probably wasn't very interesting.


saving discussions about a bug in a long ago version of SunOS probably wasn't very interesting.

Honestly even that sounds pretty fascinating:

It could help someone gather stats on the nature, frequency, and severity of bugs over time and across companies from another angle.

It could provide a fresh perspective on modern OSes by showing how historic OSes did things.

And it might be good material for a course on the history of software engineering practices, showing classes of bugs that have been eliminated, and styles of development and customer support that worked or didn't work.


I suspect the information would be too fragmentary to extract anything statistically useful in it. But, yes, there are possibly historically interesting nuggets in those sorts of topics.

Here's the article I was thinking of by the way. https://www.salon.com/2002/01/08/saving_usenet/


Why not? Our capacity for storage has been increasing exponentially such that yesterday’s data is basically of negligible size compared to what we are producing today. There’s no reason to delete history.


So no one is keeping you from doing so. No reason to hope some one will do it.


Indeed! That's why I regularly donate to the Internet Archive. :-)


Which is a very laudable response! With the caveat that pack-ratting everything is going to be an endless treadmill. I certainly favor preservation but at some point you do have to consider what you're saving and why.


You assumption "no one uses anymore" is glaringly wrong in this case.

Those archives are full of useful and informative information.

Not everthing changes fast. Common Lisp has been around for 30 years basically unchanged. The discussions back there can be truly informative for today.

It does take time to wade thought it, but people have been collecting (via the google archive, when it existed, sigh) curated lists.

https://www.xach.com/naggum/articles/ https://www.xach.com/rpw3/articles/


For the same reason we don't just tear down the pyramids and build condos there.

There are still interesting things to be learned from ancient artifacts.


But we do tear down old condos to build new ones. Should we also endeavor to retain every geocities and myspace page?

And if not, what makes comp.lang more like the pyramids than geocities?


Should we also endeavor to retain every geocities and myspace page?

Yes: https://www.archiveteam.org/index.php?title=GeoCities

Digital data is not exclusionary in physical space like condos. And even random myspace pages with hacked stylesheets show the common culture of an era.


Do you know about cuneiform? Lots of what is known are just ledgers and exercise books...

Never forget that we do not know the future.


Future digital tourism.

That or risk future archaeologists thinking COBOL was some God of the time and the natives built large metal obelisks in dedicated worship temples.


why do mennonites and other such groups use low/deprecated technologies? partially due to religious creed, but also because when the electricity is gone, oil lamps still function, and horses dont need a petrol pump to keep running.

likewise many people are clinging to the local operating system rather than moving to the SAAS model.

so what happens if we lose the oldschool languages and platforms entirely, for whatever reason ?

if TBTF corporations are somehow hobbled or neutralized, we need old hand tools to build a tech newtopia from the rubble. if those tools are destroyed then we are beholden to a system that stands on very thin ice.


I would add to this that not all forward progress is necessarily good or well thought out. If there is value in an old thing that hasn't been unlocked yet, and it is lost to history, we become collectively worse for wear. Things like Lisp are old and pretty darn cool to have as an option.

I second the need to rebuild from the rubble is often overlooked, especially by corporations driven by profit centered goals.


The thought process and conversations that produced the code give insight into how to more generally produce code of that kind. Typically code currently in use is in continuity with code that was previously used, either as a system dependency or conceptual dependency. So it's still useful to have history around, like it would be to have comments in current code.


Well I think it’s ok in general for some information to be lost, but I think a lot of HN users value this specific information.


I’m sad to see that this was downvoted, it’s contains the key questions. I think they have good answers.

1) Eventually, everything will be lost anyway. The original print of King Kong is gone. A fire at Universal Studios wiped out the masters for a lot of music at once https://en.wikipedia.org/wiki/2008_Universal_fire . Floods destroy family photos all the time. But those are examples of the forces of decay, of natural entropy, of error. The Library of Alexandria probably contained a lot of useless crap but also nuggets we’d want to know today. Information is memories, useful information is useful memories, and there’s no compelling REASON to lose it. Other sections of usenet history were wiped out when Google acquired it (a lot of comp.database.olap content I had a hand in) and groups of people just lost a knowledge base.

2) It’s not simply code that no one uses anymore. It’s a knowledge base on how and why, debates over constructs and usage that are useful beyond code-sharing snippets a la Stack Overflow.

3) There is an argument for letting some information get lost or at least super-obscure, but it’s hard to see this being a good example. Tide Pod Challenge videos come to mind. GDPR and right to be forgotten mandate something akin to information loss.

4) I posted this elsewhere but I’ll share here too: there was a comment made on the original article about preserving prior art for IP (patent) purposes. That alone is in the public interest. Irrelevant to your questions in general, but pertinent to each of them in this case.


It belongs in a museum!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: