Gmane has been a fantastic asset. For many reasons: archival, easy access to lists without subscribing, great reading interface via NNTP. I always wondered how Lars finds the energy and resources to maintain Gmane for all of us.
It's sad that the dicks of the world are about to ruin such a great resource. It's also sad that such tremendous goodwill and a large amount of work gets ruined by a bunch of jerks.
Thank you, Lars, for all the hard work over the years.
There are many open source projects whose only reliable, publicly accessible archive is and has been Gmane (due to changes in mailing list providers over the years or loss of their own backups/server resources/et al). If Gmane were to shutdown that would have huge implications for the "institutional knowledge" of some rather long-lived projects.
Personally, I know I've relied on Gmane many times to help solve problems I've had with one open source project or another.
I hope that this resolves well in favor of the protection of this archive as a resource, and I am sad at the recent troubles it has seen.
or upload it to google big query or a self pay amazon s3 bucket... public datasets get free storage iirc. would be a nice complement to the reddit / hn datasets.
edit: self clarifying my own question. Not sure what the specifics for hosting an AWS Public data set are though.
Big Query
>A public dataset is any dataset that is stored in BigQuery and made available to the general public. This page lists a special group of public datasets that Google BigQuery hosts for you to access and integrate into your applications. Google pays for the storage of these data sets and provides public access to the data via BigQuery. You pay only for the queries that you perform on the data (the first 1 TB per month is free, subject to query pricing details).
> Public Data Sets on AWS provides a centralized repository of public data sets that can be seamlessly integrated into AWS cloud-based applications. AWS is hosting the public data sets at no charge for the community, and like all AWS services, users pay only for the compute and storage they use for their own applications. Learn more about Public Data Sets on AWS and visit the Public Data Sets forum.
In https://web.archive.org/web/20160708085949/http://gmane.org/... (in the section titled "A Statement") Lars wrote he wouldn't "use any of the contents of the Gmane news spool for spamming, for advertisement, for sending mass mail, for gathering profiling information, or anything of that kind, or willingly allow anybody else to do the same". So I assume he wouldn't be fine with just making it all publicly downloadable but would want his successor to provide a similar guarantee.
I've worked on research projects that turned into a startup. We migrated all our data from free University boxes to cloud storage the startup paid for.
The professor I was working with told us about how he lost all his data once with a similar startup attempt. Shortly after I was let go from the startup I pulled all the data via the web service I wrote, luckily before they shut it down.
TL;DR I'd trust an entity like the Internet Archive way over commercial entities, even those that offer free storage for open projects. Sure the Internet Archive could up and go away, but I still trust their motives and their commitment to not let that happen.
Plus, it's not like the Internet Archive couldn't store your archive as a public dataset in AWS to reduce their own costs, if they were to decide AWS fit their criteria as a reliable public archive host.
The difference would be that if AWS itself wanted to stop providing public archive hosting, they'd just up and do that. Whereas if the Internet Archive was hosting a public archive on AWS, and AWS cut off support for public archives, the Internet Archive would then work to migrate the dataset off AWS.
Non-profits have incentives to do (whatever their charter says they do.) Corporations just have an incentive to make money. If you want a certain "preference function" to carry on into the future without self-modifying into something unrecognizable, you probably want to instantiate that preference-function in the form of a non-profit.
Or torrent it! This is the best use case for a torrent... spread the load around, no extra cost or having to find someone to flip a 'public dataset = free' bit.
I wonder how much traffic (DDOS aside) the web bit needs. Also wonder if it would be feasible to just plug the NNTP spool into the d-lang forums[1], and have it "just work" ... ?
But no matter what, the best possible solution would probably be to move the archive to the Internet Archive, and have that same archive continue to ingest email->nntp and provide access to it.
I get that people fear spammers etc, but at this point, I don't see a way to avoid it if you want a usable web archive.
Archival is only part of the problem. What about the future? I don't mind help out and donate money to build out the infrastructure. But I need a team...
If so, it'd be the end of an era. Gmane has been a tremendous asset to the community for well over a decade; indeed, it feels as though it's always been there.
I suppose that it's one more nail in the coffin of NNTP. I really miss net news, but it probably wasn't possible for it to scale to support the entire world. It's sad, though: for a short time, all the interesting people in the world were able to communicate with one another.
Actually I think NNTP solved a bunch of problems that we didn't even know we had at the time, including scaling and funding.
NNTP servers were generally provided by your ISP, so ISPs built the capacity out as they added subscribers, and added groups based on the perceived interest. It wasn't especially difficult (and it's pretty trivial today) to offer all the "official" text-based newsgroups with a few days or weeks worth of retention; it only becomes a significant undertaking when you want to offer months of retention for binaries newsgroups.
Similarly, because the NNTP servers were basically a value-added ISP service, they were funded by subscriber fees. You never had a problem with a newsgroup suddenly getting popular and being unable to pay their own hosting fees, or suddenly tossing ads in order to pay the hosting bill, which is unfortunately common with webforums. And you don't have problems with a single server crashing and wiping out the history of a whole group. It's all pretty robust.
Architecturally, netnews was superior to web forums. Where it failed was in not having a good response to the spam problem. I think that had it not been for spam, people wouldn't have left NNTP for HTTP-hosted webforums nearly as quickly. But the decentralized nature of netnews meant there wasn't anyone who could quickly implement any of the myriad solutions that were proposed (cryptographic proof-of-work plus reputation systems probably would have cut down on the worst offenders).
My guess is that eventually, we'll reinvent something that's similar to NNTP but by another name (with the decentralization, server-server replication, gateways to other protocols, etc.), but it's been a long and unfortunate diversion in the meantime.
With web forums you fire a search term into a random crawler and it will likely barf up a url. With NNTP you had to configure the email client to point to the isp server, and then locate a relevant group for what you were interested in. And even then searching the whole archive was troublesome if not impossible.
I watched something similar play out with IRC. Once AJAX allowed random websites to offer chat services non-tech channels basically wasted away, as your random joe and jane just entered some url they were told, or read about in some glossy mag, and presto.
As for reinventing NNTP, i don't think there is any incentive any longer. Especially now that something as big as Reddit can just pay Amazon to scale the back end as large as they need.
Store and forward came to be before the net, and was used during the early net, because of narrow bandwidth.
I don't know about that: the core conceit—of ISPs doing any sort of edge server hosting—is nearly dead itself. Email is nearly-entirely owned by megacorps, unencrypted (and therefore edge-cacheable) HTTP is dying, DNS is basically just Google and OpenDNS at this point... soon ISPs won't run any OSI-level-7 services at all. They won't need data-centers, just switches. I don't see that trend reversing.
And given that, you can't really have a store-and-forward edge-caching system that truly resembles NNTP. At most, you can get a hierarchical network like IRC.
Of all of these, DNS is the easist to address -- anyone can set up their own forarding nameserver pointing at the roots. Performance is pretty good.
Much as I distrust Google for anything these days, 8.8.8.8 / 8.8.4.4 offer a widespread advanced notice of domains tracking up the hit rates, and possible attacks.
Email, as STMP, after decades of nobel service really simply does deserve to die. It's far less about the mechanics of email user interfaces, for which there's a great deal of ink wasted, and much more on privacy -- encryption, metadata, and spam, probably as the three top concerns. Very nearly all the suggested replacements (SMS, various messaging systems, Facebook) have either those problems or those and the fatal condition of being proprietary silos.
The more I reflect on it the more the Web is close to what Usenet was, at least in some regards. Origins post messages, which may or may not die in short order (what's TTL for an Internet page now, a few months on average?). We're increasingly relying on sites such as the Internet Archive to actually provide a view-through-time at content.
What the web (and email, especially in mailing lists) lacks and has always lacked is a requirement to meet standards conformance. DDG "The Web is an error condition" and you'll find a far-too-true-to-be-good rant on that topic.
The loose-standards-anything-will-fly-nearly approach has allowed a great deal of flexibility in page design, but much of that is ultimately painful. I'm finding a much greater appreciation for straight-up HTML 1.0 formatted pages.
A huge part of the problem is exceptionally crappy entity styling defaults. If browsers presented pages sanely (at various resolutions), much of the B-Arc profession of Web design would simply cease to exist. And this would be a wonderful thing.
A simplified markup, acceptance criteria (and rejections for failed formats), cache-and-foreard, client-determined-formatting, security, and either a universal content syndication or UBI type compensation model might address other issues.
I'm getting the sense of some interesting possiblities out there.
> One day, I walked into the break room and heard a coworker say, “The Web is an error condition,” referring to the deplorable state of code out on the Web. I think that was the end of the end for me, because it just depressed me. It depressed me not because it was untrue, but because it was so perfectly true.
> Honestly, I miss the days when Netscape Navigator would just halt rendering in the middle of your page, saying, “No, I will not parse any more of your shit until you fix it.”
> Then IE came out for free. Suddenly, the game of web browsers changed from paid apps to supported by advertising and search revenues. The only way to get users to use your browser (and thus get more money to develop with) was to parse all the shit you used to reject.
> The web became a co-evolution of crap and trying to render crap. (It’s gotten more complicated since then, but because there’s been a habit of rendering crap, no one suddenly wants to stop.)
Pretty much. Around where i live, power companies have started offering fiber services. But they just pull the actual fiber along their existing power lines. Just about all of them has teamed up with a third party that handle not just ISP, but also TV and VOIP.
> Actually I think NNTP solved a bunch of problems that we didn't even know we had at the time, including scaling and funding.
Yeah, but think of the social scaling issues. You simply can't have every single person interested in e.g. American football reading & commenting on alt.games.football.
News was indeed a far superior architecture & experience to web forums — so long as the userbase was relatively small.
I've shot around estimates of the size of Usenet with some old Usenetters (of whom I was one, though only marginally active).
Gene "Spaff" Spafford thinks 50k - 500k users ~1988 - 1994 is probably ballpark accurate. Peter da Silva notes that in 1990 he was the most active user on Usenet ... and under another handle, the 3rd most active. Microsoft did some early 2000s research turning up low-single-digit million distinct user IDs IIRC (I've posted on this previously).
So, yeah, it was small.
Organising people into groups, even ad-hoc, from which exceptionally good messages might be shared to others, might be one way of handling scaling. Clustersizes ranging from a total size within Dunbar ranges, to a 1/9/90 split giving the 10% active users a Dunbar cap (about 300 - 3,000 users per cluster) might work.
I agree that 1990 Usenet in 2016 would be a fatal mistake.
How could alt.games.football not scale, and /r/football scale ? I wouldn't expect a single server, or even a single network host the whole conversation about football in the world. So you'd have reddit.com's alt.games.football, facebook's alt.games.football and so on. Or, more probably, brand new hierarchies like com.facebook.games.football and com.reddit.games.football. The nice thing is that they'd all have the same interface and the same protocol so they are both easily accessible and easily archivable.
A NNTP service could definitely still work, but it would have to be a lot more locked down. I really hate that the alternative is web forums which are always stuck with a small fraction of the functionality of any good news reader and get bogged down easily if they get popular and never have a good search feature.
NNTP is just the right abstraction to wrap any discussion-based thing. It would be nice to have just nntp://news.ycombinator.com, nntp://reddit.com and so on instead of bunch of incompatible APIs actually. It's like beautiful Perl's TIE [1] where you wrap an object in it and it exposed as regular scalar, array, hash or filehandle instead of 'an object' with it's own methods users need to learn to use it.
I think Gmane 2.0, if it ever will be born, would have a mechanism of plugins, that allow easy wrapping any web-forum in it, by writing a little template describing the layout of pages like 'this is topic, this is message body' and so on. And the crawler to use the plugins, of course.
A version of NNTP where groups were automatically moderated and strong controls were in place to weed out spammers and trolls could work. The moderation would have to be baked into the core of the protocol and not some weird half measure bolted onto the side however. Once a community is large enough you either moderate it or you get another /b/.
NNTP is a very nice, simple protocol to implement, both server and client-side.
Well, except for the mess that is RFC 822. At least NNTP decided to go 8-bit safe eons ago, so you just have to shove everything in UTF-8 and you can screw RFC 2047 in the face.
* Handling attachments can be difficult (particularly the inability to safely add binary data)
* The display name/address is both painful in terms of syntax.
* CFWS in general (you can tell who's written an email client by their reaction to the letters CFWS)
* Header fields in general have way too lenient handling, so there's no way to, e.g., guarantee non-ASCII handling (RFC 2047 needs to be handled differently in the Subject: header than from the From: header, and the Content-Type: header does something else entirely).
* Mandatory line-length limits, which means you need to be able to insert line breaks in however you encode custom header values
* No way to get attachments without the body or vice versa
* All metadata needs to be shoved into a header (see above about i18n, line length, and CFWS)
How many institutionally critical projects are run by solitary figures out of pure goodwill? There seems to be an obvious need for a foundation to find-and-fund critical projects like this so they don't go into oblivion.
There was the timezone thing a few years back, but I think that was resolved, according to some definition of resolved. But that wasn't about lack of support, more a legal attack on the maintainer. https://en.wikipedia.org/wiki/Tz_database#History
It's yet another episode in how spam and abuse drive the Internet towards feudalism. Either you need a business willing to pay people to handle it, or real volunteer organization with a pool of volunteers, or some combination of them.
I'm starting to see what I'm calling for now "hygiene factors" as a critical element and technological mechanism for almost anything. Maybe it's a generalisation of le Chatlier's principle (@btilly's got a good post on this).
To make this concrete: put enough people or activity or stuff in some area and you end up with noxious effects. Bulk, bioactive, chemoactive, systemic, and even cultural impacts. For most towns and cities, water pollution and sanitation rapidly become pain points. For transport systems, congestion, but also highway robbery and piracy. For communications systems, it's the annoyance of low-quality or unwanted signals. A key point is that cost reduction makes all hygiene factors worse. You end up with more of the bad stuff. There's also scale effects. Ultimately, you need to apply different cost factors to the wanted and unwanted effects (generally through some form of policing, patrolling, filtering, reputation feedback, etc., etc.).
Or you can limit the total size of unrestricted networks and provide gateways (with filters) between them. And yes, that means you're subject to policies (egress and ingress) on both sides of those filters.
Even without the DoS issue, it's hard to keep carrying on a project alone year after year. A more resilient organizational pattern would be a group of people sharing the burden of managing all the issues and nuisances (obviously people can fight and break up.)
Wouldn't the Internet Archive be interested in archiving all those mailing lists?
If the author reads this, how large are the dumps and is it something that can be pushed over SFTP or any other file transfer protocol? I might be tempted to have a play but I don't want you to be out of pocket FedExing the content for something that I cannot guarantee will go live.
If you follow the update link at the end of the post, there are more details. It's 2 TB of data, so he probably prefers the FedEx cost over having his internet uplink saturated for a week.
"If somebody else wants to take over the concept, I can FedEx you a disk containing the archive (as an NNTP spool)."
It sounds like a great opportunity for someone (or a group of people) passionate about Gmane to take over the reins here. I'd hate to see a project with so much useful archived content to disappear.
I have always liked the download.gmane.org option. I rarely use the web interface.
I hope download.gmane.org does not disappear.
Gmane is a truly great service. IMO, it does not need to be part of "the web". It's better than that. It's part of "the internet". One of the best parts.
The lists Gmane provided where primarily various open source projects and similar.
Gmane doesn't host the mailing lists itself. Instead it provides access to lists hosted elsewhere. It does that by subscribing to the list and then saving the incoming messages. It then provides access to those messages using NNTP, or various web interfaces (different layouts etc). IIRC it may also provide RSS. It also allows posting to the lists by forwarding the message.
The benefits are great as a reader since this is one place you can to read many lists, and you don't have to subscribe to each one etc. (Some may require subscribing for posting though.)
The other benefit is that if you get people on mailing lists asking why use something archaic like email when the hotness is various web based boards, and the project is immediately and unconditionally doomed because that is what people are used to. You redirect those folks to Gmane where one of the layouts is web board style.
I haven't used Gmane or Gnus for quite a while but one vivid memory comes to mind of me having a problem with Gnus one Friday morning at about 11:00. So I posted a question asking if anyone else had the problem; I no longer remember what the problem was. What I do remember however is that within two hours I had a reply with an explanation and a fix from Lars Ingebrigtsen himself.
That kind of support is literally priceless, you can't buy it from Microsoft or Google or Oracle, etc. And even when you can get some kind of support from them it takes hours or more likely days and is filtered through multiple layers of semi-competent drones regurgitating the official docs.
I think it's horrible that Lars should be so stressed. All I can think of to say is just hand it over to someone and walk away; you've done your bit and more than most to make the Internet a good place, don't let it, or them, grind you down.
Apart from easy web access, is there more to the HTTP than the NNTP interface of Gmane? Like search over every message, maybe? If not, people should find their favourite newsreader and be all set, cf. https://en.wikipedia.org/wiki/List_of_Usenet_newsreaders
I recently discovered Gmane, and found there were a lot of interesting topics, but also found that some (mainly videogames) were somewhat active before but now aren't active at all. I wish I had found it earlier. Still has been a tremendous resource as others have said.
It would be awesome to just import the Gmane archive into Matrix.org and/or IPFS and continue it as a decentralised archive. Which would also (in time) be harder to DDoS, and would be run by the participants rather than being a centralised service!
I've used Gmane back when I was trying to debug linux-bluetooth issues to search through my old e-mail threads. It's a pretty helpful index. I didn't realize it was a single maintainer effort.
If you were on Usenet for long in the 90s, you developed your own way of answering this question.
My answer to cope with this is that there may be many different rationalizations in the mind(s) of the attacker(s), or none at all. It is impossible to understand the psychology of trolls. Well, maybe not impossible, but certainly not worth the effort. They are basically noxious pests, like cockroaches in a restaurant or wasps at a barbecue.
It's amazing how many of these integral tools that we've relied on for a long time are maintained by selfless individuals in their basements. Fantastic.
Most mailing lists I've used have a fairly poor archive interface compared to Gmane. Some of them are also missing parts of their own history due to careless maintenance.
My use case for gmane over nntp was to casually/loosely/irregularly follow along with a bunch of mailing lists in one place without cluttering up my email storage or interrupting me. A newsreader was a nice way to do that in a way the web just can't do.
> All technical discussion took place on mailing lists those days, and archiving those were, at best, spotty and with horrible web interfaces.
Not to take away from the contribution Gmane has made to many of my searches over the years, its web interface was one I always held as the epitome of awful design.
Considering its awesome usability, the design can't be that bad. Sure, it might look dated by today's taste but its users don't care when they want to read a specific archived list thread with minimal friction.
I feel the same way. I don't want to diminish the work put into Gmane, but every time I hit a link to Gmane I jump to marc.info and see if I can read the content there.
Still, I continue to be impressed by the amount of work people put into something they care about, while the rest of us just expects it to be there. As some one else pointed out, it really makes you wonder what other kind of one person projects we all use all the time.
And yet. If Lars will mail out that disk to somebody, somebody may well just pitch to a bunch of VCs, build a fresh web interface and bury it all in ads.
Couldn't Lars do that himself?
The plastering it all in ads, I mean. As somebody in another comment said correctly, the age of online altruism is dead. We have our ideas who killed it, we mourn the passing of an age, but we can't change it.
(OK, just ignore this if the ads are already all over Gmane. I wouldn't know, I adblock...)
> Couldn't Lars do that himself? The plastering it all in ads, I mean
Perhaps he doesn't think the money is worth the admin time involved. He's still have to deal with take down requests and DoS attacks, and also have to deal with advertising companies too on top.
He may also prefer not to see his creation go that way under his watch.
I would suggest a better option would be to throw the content of the disk at a public hosting service if any of those who offer free hosting for public data sets would be relevant, or alternately put it out as a torrent until the content is sufficiently well distributed. I'd be happy to throw a chunk of bandwidth at the initial seeding effort as I'm sure would others.
> the age of online altruism is dead (re: plastering it all in ads)
Just because it is difficult to maintain the old ways, doesn't mean all us old timers are willing to take part in propagating the new ones that we don't like!
> I wouldn't know, I adblock
An example of your "online altruism is dead" fact. "You can pay for it with ads (that I personally won't see so don't expect me to contribute)". As more and more people block ads you have to get more and more crafty to make money out of them which makes the whole business just that little bit more shady with each passing day. As per my first two points I can certainly understand him not wanting to go in that direction, as I'd never volunteer for it myself.
Whatever happened to the pay service model? I am actually happy to pay for services like this. Even if every user only paid $1/month it would probably cover most of the costs.
There are a lot of sites (dozens, maybe) that I would be willing to pay $1-$10 a month for, except that I'm not willing to sign up for all of them individually, and not all of them are willing to sell me their information/services at a price that matches the value that they provide. It's not worth my time to manage each of those subscriptions on its own, and there isn't really a good way to manage them centrally, from the same interface.
The money is almost certainly worth it, but the time involved almost certainly isn't.
The vast majority of people don't want to pay and many of those that might are still put off be the little bit of extra effort/risk (enter credit card details?, sign up for an account?, ...) and will go look elsewhere instead.
And if if the money would be enough to make the enterprise worth while that is not his main point. He was doing it mainly as a fun side project to help others and it simply isn't fun any more. Maybe it could make enough money to make it worth while in a purely time/income based benefit analysis but he still would rather stop as it isn't enjoyable to him at this stage and if even he wanted to spend that much time/effort to make that money perhaps he has other projects he'd rather spend the time/effort on.
If the service truly could be made a going concern in that way he'll no doubt be inundated with offers to take over and try running it in that fashion, so you'll see the new pay service appear in short order.
I'm a huge fan of gmane, but I mostly use it by it showing up in Google results. I wouldn't feel like it was worth a subscription, and while a 50c charge per thread would probably be a fair reflection of the value for me, the friction would be way too high.
Free triggers a completely different response than 'stupidly cheap'.
Cheap still requires paying money, which is a barrier just for the act. Moreover, there is the fallacy that something free has an infinite value/cost ratio. (fallacious because you should look at value - cost)
It's sad that the dicks of the world are about to ruin such a great resource. It's also sad that such tremendous goodwill and a large amount of work gets ruined by a bunch of jerks.
Thank you, Lars, for all the hard work over the years.