Hacker News new | past | comments | ask | show | jobs | submit login
A Theoretical Justification (2021) (marginalia.nu)
85 points by notpushkin on Aug 24, 2022 | hide | past | favorite | 22 comments



Just a warning, this text is both fairly dated with its description of how the search engine works, and fairly incomplete. Was just today thinking about expanding on it.

There's a lot to be said about how search engines not only mediate the Internet as it exists, but actually shape what gets created. If you want free and independent websites, you need a free and independent internet mediator that lets people find and partake in such a community online. That is not Google, nor Reddit, nor Facebook, nor Twitter.

Right now, while a sort of stubborn counter-culture exists in the fringes of the internet, it's just a shadow of what the internet could be. Small websites just can't compete against the SEO-industry on Google. So they're almost impossible to find, and struggle to find visitors. It's a real tragedy how many brilliant and fascinating websites are languishing in obscurity. Instead we get listicles, content farms, click funnels.

I think ultimately, for the sake of both culture and democracy, what we need is a service akin to a public library, to complement the shopping mall that is Google Search. (Although even the commercial side of search is pretty lacking, but that is a separate problem)


I hate SEO-optimized garbage as much as the next person, but I don't see how this avoids that problem other than being small enough that nobody is optimizing for your search engine.

If this was to become popular I suspect it would be overflown with SEO garbage just like Google. I guess that means we can use it while it lasts.


I'm not convinced. I'm not convinced I'll ever get big enough for that to be a problem. Even if I'm 5 times bigger than every current competitor to Google combined, I've only got few percent of the market share.

But sure, even if that would happen, I have an ace up my sleeve. An devastating and incredibly simple way of limiting spam, which is to go for the wallet and de-rank sites for having ads.

Google could and would never do that, of course, because they are selling the same ads. It would undermine their entire business model. This conflict of interest is the core of Google's search engine spam problems.


Theoretically this could work.

It seems that you're talking about de-ranking the fat middle of the curve which serves the banal content consumed by those in the fat middle themselves, leaving only the tail end, theoretically serving the corresponding tail end of content consumers, which might approximate the early days of the internet before the normies invaded, before the barrier to entry was lowered such that it approached zero.

It could work.


Yeah. It's not hard to find banal content on Google. No need to build a search engine for that.


Google would get sued if they did that. Google are already sued a lot for their search ranking behavior, everything they do they have to be able to defend in court.


Well they already sort of did, but that was 10 years ago, when the Internet looked very differently and they were in a different position.

Today they don't just have legal concerns, they arguably have to worry about regulation as well.


What about the sites without ads that still do Amazon referral links or similar covert profit generation strategies?


In general, it's a lot smaller problem. With the current SEO spam, the goal is to get a visit, not necessarily actual engagement. To get something out of amazon affiliate links or similar, you actually need to convince someone to click the link and go to amazon and buy something. That's a quite different beast.

But also... ¯\_(ツ)_/¯

https://git.marginalia.nu/marginalia/marginalia.nu/src/commi...


Please never stop <3


SEO only exists because search engines are user-hostile garbage. Imagine being able to blacklist domains from your results. That would basically solve the problem, because the first time you encounter spammy garbage you'd simply nuke it forever.

You could even have pre-made open source lists. Basically ublock origin for SEO crap.

In a very real sense, it only happens because it's being allowed to happen.


> Imagine being able to blacklist domains from your results. That would basically solve the problem, because the first time you encounter spammy garbage you'd simply nuke it forever.

Can this extension solve your problem?

https://chrome.google.com/webstore/detail/ublacklist/pncfbmi...


SEO would still exist if Google was simple pagerank, and pagerank does not seem in any way user hostile.


> but I don't see how this avoids that problem other than being small enough that nobody is optimizing for your search engine.

Google doesn't have any countermeasures against a lot of what we consider spam and generally user-hostile.

Google could at least try to detect spammy recipes by for example detecting certain keywords (likelihood of it being a recipe) and then downranking it based on length, with the idea that if we're confident it's a recipe it should be short and to the point, and anyone telling their life story on it can go to hell.

Google could detect listicles similarly.

Yes, this can all be gamed, and I will expect there to be an arms race, but it will at least raise the bar and make spam content costlier to produce and require constant maintenance as Google's algorithms get better. Yet, right now, Google isn't even trying any of this, and why would they? Spam typically has ads and/or analytics on it, both which can be Google's and thus benefit them. Why would they ever expend extra engineering effort (thus money) to ultimately earn less money if they're successful?

In addition, the ultimate counter-attack to profit-motivated spam would be to just detect & downrank what gets them paid - ads, affiliate links or analytics, as I explained in previous comments such as https://news.ycombinator.com/item?id=32434317. All of those often require things that would make them trivially detectable, whether it's ads that must be a third-party script so the ad network can detect fraud, affiliate links which ultimately must lead to a large affiliate domain such as Amazon and sometimes mandate legal disclosures that can be detected, or analytics that likewise rely on a third-party script.


Google may be a bit limited due to their monopolistic position. Many of the more devastating things they could do would probably skirt pretty close to anti-competitive behavior.


There's Bing, etc. I could see issues where Google is unfairly prioritising their own properties, but if they're merely going after user experience (including downranking ads which hurts their bottom-line), I'm not sure where the antitrust argument is?

They can always offer it as an option that the user must explicitly opt-in, that way nothing is forced onto the users (yet everyone will enable it pretty quickly if the results end up better).


Well if you have a 97%+ market share, I'd to say you're a defacto monopoly.

Might fall within refusal-to-deal (which is fairly nebulous concept), especially given they are also in the advertisement business. Anything that could be interpreted as blocking search results with competing ads to their own might come under scrutiny.


> Right now, while a sort of stubborn counter-culture exists in the fringes of the internet, it's just a shadow of what the internet could be.

I'm not sure it could ever be more than a counter-culture on the fringes of the internet, nor am I sure it should be. The popularisation of the web is a genie that cannot be put back in the bottle. And most people don't want what you're selling. I don't say that to be disparaging at all, as I personally love projects like this and Gemini, etc. But I think it's okay that the small web should remain small.

Related to this is the pervasive theory - you could call it a conspiracy theory - that the current state of the web has been thrust upon the masses against their will. On the contrary, it has been driven primarily by consumer demand. Behind every corporation peddling a toxic product is a million or more (often much more) consumers who want it. Popularising the small web would fundamentally change and most likely ruin it even if you somehow managed to keep perverse corporate incentives out (or maybe keeping the corporate incentives out means it can never really get popular).

I guess my point is that it is not a shadow of what it could be. I don't think it could be much more than what it is, which is fine.


> I'm not sure it could ever be more than a counter-culture on the fringes of the internet, nor am I sure it should be. The popularisation of the web is a genie that cannot be put back in the bottle. And most people don't want what you're selling. I don't say that to be disparaging at all, as I personally love projects like this and Gemini, etc. But I think it's okay that the small web should remain small.

I'm not really envisioning making the small websites big, but rather making the internet seem less small. Judging by the results you find on Google and so on, you might think there was only a few hundred websites, because that's all you ever see. There are millions.

> On the contrary, it has been driven primarily by consumer demand

Who, besides the advertisement industry, wants the state of the internet as it is today?

Even my hairdresser complains about how bad it's gotten, how pointlessly vapid the listicles are, how hard it is to find a reliable review. Show me, where is the demand for bait-and-switch click funnels, for email spam, for websites that take 45 seconds to load on an android phone only to have ads pop in between every sentence? Who is the person that wants to scroll through dozens of blog posts trying to find an answer to a problem, only to slowly realize it's just AI-generated word salad?

The internet, as it looks today, fails rather spectacularly to meet consumer demand. It is incredibly hostile not only to consumers, but to content producers as well. The only real winners are the middle men, the platform operators that have inserted themselves in the middle of this whole mess.


The narrative that "if it sells it means people want it" has been used for a long time (e.g. cigarettes). Of course, that simplistic take ignores the (1) role that advertising plays in manufacturing those desires in the first place, and (2) the fact that consumers often don't have a choice.

Relating to (2): everyone I know complains about the state of modern Web (from my tech friends to my mum) one way or the other. They simply have no choice because, as mentioned several times in these comments, even tech people have an incredibly hard time extricating themselves from dark patterns and SEO-hell search results and surveillance and AI-generated listicles... let alone regular people!


For me the theoretical justification is to avoid rich-gets-richer multiplicative effects. That is, once you have many links, you have a high PageRank and you are much more likely to be linked and seen which self-perpetuates this problem.

The same thing happens with journal publications. If you get a few citations early and you break through, you are likely to get many. Otherwise, near zero. Winner takes it all. For a deeper technical discussion see [1]. This effect tends to perpetuate weird biases. For example the Alzheimer's scam recently featured in HN [2].

HN tries to avoid this problem by giving some chances to new comments. AFAIK, Google or Google Scholar don't have a similar mechanism.

[1] The Fundamentals of Heavy Tails : Properties, Emergence, and Estimation. Section 6.3. https://adamwierman.com/wp-content/uploads/2021/05/book-05-1...

[2] https://news.ycombinator.com/item?id=32183302


Yes, there was a glorious past where almost no websites used popups asking you to subscribe to their newsletter, or limited your usage with a paywall, or had you jumping through hoops with VPNs to get another five free article before that paywall kicks in, and so on.

That was just around the time where you still had that subscription for the dead-tree edition of the newspaper whose website so graciously allows you to read all articles. Or somebody else had that particular paid subscription, and you had one with that national magazine, also with a great website, that the other gal happens to enjoy reading.

You didn't block ads quite that aggressively, back then. The extensions just didn't work that well at the time. There were quite a few hit-the-monkey flash ads blinking away and playing sounds, but your preferred quality publications were mostly ad-free at the time. They must have been doing it for the joy it brings to be journalisting away without any commercial considerations, whatsoever.

You would gladly pay for good writing, but it just isn't happening anymore! Plus: why should you pay for reading the news when they all print stories on exactly the same events? Clearly, B copied the story from A, C got it from B, and A just repeated what C had written! Absolutely nobody did any work!

Each newspaper should report on its own sets of news and facts, otherwise where is the art? Where is the creativity? Once you find some outlet that has acceptable quality, you will gladly start paying again. It's hard to say what "acceptable quality" means in this regard, but it is really just the bare minimum, what any student newspaper editor gets right, and it is so sad to see that among a few thousand newsrooms, not a single one manages to clear that low bar. Someone should write a story how unfortunate it is that in a group of tens of thousands of professional journalists, every single combination of <x> of them is below average.

Yeah, okay, there's maybe one or two that do important, high-quality work. As soon as they fix the layout in Brave on Mandrake Linux, you will consider subscribing. Unless they ever publish anything by that girl writer again, the bi** with the issues, you know? Then, you'll cancel the subscription and go back to reading their website with archive.is.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: