Hacker Newsnew | past | comments | ask | show | jobs | submit | senko's commentslogin

"Grifters" a bit harsh, considering they used VC money to become successful in the first place (see other comments).

Presumably they knew the deal.


In the long run, all VC investments lead to enshittification.

> Why is it that Substack can’t just be run as a 10-20 person small-medium sized business forever?

They have taken VC money.


Would you rather read an article praising LLMs written by someone having a stake in chilli peppers business?

Asking for a friend.


Perl 6 definitely sucked up any forward momentum in the community.

It's become a poster child for how not to do a major transition.

KDE3/4, GNOME 2/3, Python 2/3 transitions all benefited from this hindsight (still experiencing a lot of pain themselves).

Raku might be an interesting language (I haven't dug deep), but it's not Perl. Larry et al should've just called it separately from the start and allow others to carry on the Perl torch. They did this too late, when the momentum was already dead.

Perl 5 was a product of its time, but so was Linux, C, Python 2 or PHP3, and they're still very much relevant.


> As a user I want the agent to be my full proxy. As a website operator I don’t want a mob of bots draining my resource

The entire distinction here is that as a website operator you wish to serve me ads. Otherwise, an agent under my control, or my personal use of your website, should make no difference to you.

I do hope this eventually leads to per-visit micropayments as an alternative to ads.

Cloudflare, Google, and friends are in unique position to do this.


> The entire distinction here is that as a website operator you wish to serve me ads

While this is sometimes the case, it’s not always so.

For example Fediverse nodes and self-hosted sites frequently block crawlers. This isn’t due to ads, rather because it costs real money to serve the site and crawlers are often considered parasitic.

Another example would be where a commerce site doesn’t want competitors bulk-scraping their catalog.

In all these cases you can for sure make reasonable “information wants to be free” arguments as to why these hopes can’t be realized, but do be clear that it’s a separate argument from ad revenue.

I think it’s interesting to split revenue into marginal distribution/serving costs, and up-front content creation costs. The former can easily be federated in an API-centric model, but figuring out how to compensate content creators is much harder; it’s an unsolved problem currently, and this will only get harder as training on content becomes more valuable (yet still fair use).


> it costs real money to serve the site and crawlers are often considered parasitic.

> Another example would be where a commerce site doesn’t want competitors bulk-scraping their catalog

I think of crawlers that bulk download/scrape (eg. for training) as distinct from an agent that interacts with a website on behalf of one user.

For example, if I ask an AI to book a hotel reservation, that's - in my mind - different from a bot that scrapes all available accommodation.

For the latter, ideally a common corpus would be created and maintained, AI providers (or upstart search engines) would pay to access this data, and the funds would be distributed to the sites crawled.

(never gonna happen but one can dream...)


But which hotel reservation? I want my agent to look at all available options and help me pick the best one - location vs price vs quality. How does it do that other than by scanning all available options? (Realistically Expedia has that market on lock, but the hypothetical still remains.)

I think that a free (as in beer) Internet is important. Putting the Internet behind a paywall will harm poor people across the world. The harms caused by ad tracking are far less than the benefits of free access to all of humanity.

I agree with you. At the same time, I never want to see an ad. Anywhere. I simply don't. I won't judge services for serving ads, but I absolutely will do anything I can on the client-side to never be exposed to any.

I find ads so aesthetically irksome that I have lost out on a lot of money across the past few decades by never placing any ads on any site or web app I've released, simply because I'd find it hypocritical to expose others to something I try so hard to avoid ever seeing and because I want to provide the best and most visually appealing possible experience to users.


So far, ad driven Internet has been a disaster. It was better when producing content wasn’t a business model; people would just share things because they wanted to share them. The downside was it was smaller.

It’s kind of funny to remember that complaining about the “signal to noise ratio” in a comment section use to be a sort of nerd catchphrase thing.


> The downside was it was smaller.

Was this a bad thing though? Just because today's is bigger, doesn't make it better. There are so many things out there doing the same thing just run by different people. The amount of unique stuff does not match the bigger. Would love to see something like $(unique($internet) | wc -l)


Serving ads for third-worlders is way less profitable though.

This is actually a public validation for your friend's startup.

A proper learning tool will have history of conversation with the student, understand their knowledge level, have handcrafted curricula (to match whatever the student is supposed to learn), and be less susceptible to hallucination.

OpenAI have a bunch of other things to worry about and won't just pivot to this space.


Are you going to examine a few petabytes of data for each model you want to run, to check if a random paragraph from Main Kampf is in there? How?

We need better tools to examine the weights (what gets activated to which extent for which topics, for example). Getting full training corpus, while nice, cannot be our only choice.


> Are you going to examine a few petabytes of data for each model (...) How?

I can think of a few ways. Perhaps I'd use an LLM to find objectionable content. But anyway, it is the same argument as you can have against e.g. the Linux kernel. Are you going to read every line of code to see if it is secure? Maybe, or maybe not, but that is not the point.

The point is now a model is a black box. It might as well be a Trojan horse.


Let's pretend for a moment that the entire training corpus for Deepseek-R1 were released.

How would you download it?

Where would you store it?


I mean many people I know have 100tb+ in storage at home now. A large enough team of dedicated community members cooperating and sharing compute resources online should be able to reproduce any model.

You would use an LLM to process a few petabytes of data to find a needle in the haystack?

Cheaper to train your own.


I am using Claude daily, exclusively via the API (in Zed, added my own token) and spend a few bucks a day tops.

Unlimited plans encourage wasting resources[0]. By actually paying for what you use, you can be a bit more economical and still get a lot of mileage out of it.

$100/$200 is still a great deal (as you said), but it does make sense for actually-$2000 users to get charged differently.

0: In my hometown, (some) people have unlimited central heating (in winter) for a fixed fee. On warmer days, people are known to open windows instead of turning off the heating. It's free, who cares...


Because social media is a winner-take-all with strong network effects.

AI isn't.


I've managed a Wireguard-based VPN before Tailscale. It's pretty straightforward[0].

Tailscale makes it even more convenient and adds some goodies on top. I'm a happy (free tier) user.

[0] I also managed an OpenVPN setup with a few hundred nodes a few decades back. Boy do we have it easy now...


Consider applying for YC's Fall 2025 batch! Applications are open till Aug 4

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: