Hacker News new | past | comments | ask | show | jobs | submit login

...at some point, some people started appreciating mailing lists and the distributed nature of Git again.



And Usenet, and IRC with a registered user prereq to join.

Also, set AI tarpits as fake links with recursive calls. Make then mad with non-curated bullshit made from Markov chain generators until their cache begins to rot forever.


This problem will likely only get worse, so I'd be interested to see how people adapt. I was thinking about sending data through the mail like the old days. Maybe we go back to the original Tim Berners-Lee Xanadu setup charging users small amounts for access but working out ISP or VPN deals to give subscribers enough credit to browse without issues.


Xanadu was Ted Nelson, not Tim Berners-Lee.

Also I would argue that not having capitalist incentives baked directly into the network is what made the web work, for good or bad. Xanadu would never have gotten off the ground if people had to pay their ISP then pay for every website, or every packet, or every clicked link or whatever.

Reading the Xanadu page on Wikipedia tells me "Every document can contain a royalty mechanism at any desired degree of granularity to ensure payment on any portion accessed, including virtual copies ("transclusions") of all or part of the document."

That would be absolute chaos at scale.


Oops, you're right! They claimed that Tim Berners-Lee stole their idea.

I agree that the lack of monetization was important to the development and that it would have been chaos as proposed, but will the current setup be sustainable forever in the world of AI?

We have projects like Ethereum that are specifically intended to merge payments and computing, and I wouldn't be surprised if at some point in the future, some kind of small access fee negotiated in the background without direct user involvement become a component of access. I wouldn't expect people to pay ISPs but rather some kind of token exchange to occur that would benefit both the network operators and the web hosts by verifying classes of users. Non-fungible token exchanges could be used as a kind of CATPCHA replacement by cryptographically verifying users anonymously with a third-party token holder as the intermediary.

For example, let's say Mullvad or some other VPN company purchased a small amount of verification tokens for its subscribers who pay them anonymously for an account. On the other side, let's say a government requires people to register through their ISP, and the ISP purchases the same tokens on behalf of the user, and then exchanges the tokens on behalf of the user. In either case, the person can stand behind a third party who both sends them the data they requested and exchanges the verification tokens, which the site operator could then exchange for reimbursement of their services to their hosting provider.

This is just a high-level idea of how we might get around the challenges of a web dominated by bots and AI, but I'm sure the reality of it will be more interesting.


I hate AI as much as any reasonable person should, but I don't think money is a viable filter when governments and corporations will just throw as much money legislation and infrastructure at it as needed to render it irrelevant. They can just budget it in, or pass laws requiring privileged access.

Meanwhile as profit motives begin to dominate (as they inevitably would,) access to information and resources becomes more and more of a privilege than a right, and everything becomes more commercialized, faster.

I won't claim to have a better idea, though. The best solutions in my mind are simply not publishing anything to the web and letting AI choke on its own vomit, or poisoning anything you do publish, somehow.


Usenet, as far as I remember, used to be a fucking hell to maintain right. With each server having to basically mirror everything, it was a hog on bandwidth and storage, and most server software at its heyday was a hog on filesystems of its day (you had to make sure you have plenty of inodes to spare).

The other day, I logged into Usenet using eternalseptember, and found out that it consisted in 95% of zombies sending spam you could recognize from the millenium start. On one hand, it made me feel pretty nostalgic. Yay, 9/11 conspiracy theories! Yay, more all-caps deranged Illuminati conspiracies! Yay, Nigerian princes! Yay, dick pills! And an occasional on-topic message which strangely felt out of place.

On the other hand, I felt like I was in a half-dark mall bereft of most of its tenants, where the only place left is 85-year old watch repair shop and a photocopy service on the other end of the floor. On still another hand, turns out I haven't missed much by not being on Usenet, as all-caps deranged conspiracy shit is quite abound on Facebook.

I would welcome a modern replacement for Usenet, but I feel like it would need a thorough redesign based on modern connectivity patterns and computing realities.


Culturally, the modern replacement for Usenet is probably Reddit. Architecturally, probably something built on top of a federated protocol like ActivityPub (Mastodon) or Nostr (Lemmy).

But I guess realistically you can't fight entropy forever. Even Hacker News, aggressively moderated as it is, is slowly but irrevocably degrading over time.


I have a curated list of tech and science related newsgroups which work really well. No SPAM since Google Groups went to /dev/null.

Also, I often access FIDO over NNTP.


Usenet wasn't that bad if you didn't take the binary groups.

> and found out that it consisted in 95% of zombies sending spam you could recognize from the millenium start

I like to imagine a forgotten server, running since the mid-90s, its owners long since imprisoned for tax fraud, still pumping out its daily quota of penis enlagement spam.


Yes and no.

The distributed nature of git is fine until you want to serve it to the world - then, you're back to bad actors. They're looking for commits because it's nicely chunked, I'm taking a guess.


> They're looking for commits because it's nicely chunked, I'm taking a guess.

They're not looking for anything specifically from what I can tell. If that was the case, they would be just cloning the git repository, as it would be the easiest way to ingest such information. Instead, they just want to guzzle every single URL they can get hold of. And a web frontend for git generates thousands of those. Every file in a repository results in dozens, if not hundreds of unique links for file revisions, blame, etc. and many of those are expensive to serve. Which is why they are often put in robots.txt, so everything was fine until the LLM crawlers came along and ignored robots.txt.


The distributed nature of git lets me be independent of some central instance (you may decide that the master copy resides on Github, but with the advent of mesh VPNs like the ones Zerotier and Tailscale offer, you could also sidestep it and push/pull from your colleagues directly as well). It also lets me dictate who gets to access it.

What the article describes, though, is possibly the worst way a machine can access a git repository, which is using a web UI and scraping that, instead of cloning it and adding all the commits to its training set. I feel like they simply don't give a shit. They got such a huge capital injection that they feel they can afford to not give a shit about their own cost efficiency and that they go using the scorched earth tactics. After all, even their own LLMs can produce a naive scraper that wreaks havoc on the internet infrastructure, and they just let it loose. Got mine, fuck you all the way!

But then they will release some DeepSeek R(xyz), and yay, all the hackernews who were roasting them for such methods, will be applauding them for a new version of an "open source" stochastic parrot. Yay indeed.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: