Google is apparently struggling to contain an ongoing spam attack

amluto · on Dec 22, 2023

> The spam site was checking for Googlebot IP addresses. If the visitor’s IP address matched as belonging to Google then the spam page displayed content to Googlebot. > > All other visitors got a redirect to other domains that displayed sketchy content.

Years ago, Google had an explicit policy that sites that showed different content to Googlebot than they showed to regular unauthenticated users were not allowed, and they got heavily penalized. This policy is long gone, but it would help here (assuming the automated tooling to enforce it was any good, and I assume to was).

More recently, Google seems totally okay with sites that show content to Googlebot but go out of their way not to show that content to regular users.

rented_mule · on Dec 22, 2023

About 10 years ago, I was working on a site that served several hundred million non-crawler hits a month. Many of our millions of pages had their content change multiple times a day. Because of the popularity and frequent changes, the crawlers hit us constantly... crawlers accounted for ~90% of our traffic - billions of hits per month. Bing was ~70% of the crawler traffic and Google was ~25% of it. We noticed it because Bing quickly became very aggressive about crawling, exposing some of our scaling limits as they doubled our already significant traffic in a few short months.

I was working on the system that picked ads to show on our pages (we had our own internal ad system, doing targeting based on our own data). This was the most computationally intensive part of serving our pages and the ads were embedded directly in the HTML of the page. When we realized that 90% of our ad pick infrastructure was dedicated to feeding the crawlers, we immediately thought of turning ads off for them (we never billed advertisers for them anyway). But hiding the ads seemed to go directly against the spirit of Google's policy of showing their crawlers the same content.

Among other things, we ended up disabling almost all targeting and showing crawlers random ads that roughly fit the page. This dropped our ad pick infra costs by nearly 80%, saving 6-figures a month. It also let us take a step back to decide where we could make long term investments in our infra rather than being overwhelmed with quick fixes to keep the crawlers fed.

This kind of thing is what people are missing when they wonder why a company needs more than a few engineers - after all, someone could duplicate the core functionality of the product in 100 lines of code. At sufficient scale, it takes real engineering just to handle the traffic from the crawlers so they can send you more users. There are an untold number of other things like this that have to be handled at scale, but that are hard to imagine if you haven't worked at similar scale.

bombcar · on Dec 23, 2023

90% of traffic being crawlers seems (on the face) just absolutely batshit insane.

amenhotep · on Dec 23, 2023

Seems like a natural consequence of having "millions of pages", if you think about it? You might have a lot of users, but they're only looking at what they want to look at. The crawlers are hitting every single link, revisiting all the links they've seen before, their traffic scales differently.

mckn1ght · on Dec 23, 2023

I think you’re right. At first I thought “crawlers are actually creating large amounts of spam requests” but this is just the way a searchable web functions. The crawlers are just building the index of the internet.

0xDEAFBEAD · on Dec 23, 2023

Maybe Google needs to implement an API where you can notify it when a page on your site has changed. That should cut down on redundant crawls a lot, eh?

rented_mule · on Dec 23, 2023

We very much wanted this! We had people that were ex-Google and ex-Bing who reached out to former colleagues, but nothing came of it. You'd think it would be in their interest, too.

The best explanation I can come up with is that a failure to notify them of a change makes them look bad when their search results are out of date. Especially if the failures are malicious, fitting in with the general theme of the article.

tech234a · on Dec 23, 2023

In 2021 Bing, Yandex, Seznam.cz, and (later, in 2023) Naver ended up implementing a standard where you can notify one search engine of a page update and the other participating search engines are also notified [1, 2, 3].

[1]: https://www.indexnow.org/

[2]: https://www.bing.com/indexnow

[3]: https://blogs.bing.com/webmaster/october-2021/IndexNow-Insta...

0xDEAFBEAD · on Dec 23, 2023

>The best explanation I can come up with is that a failure to notify them of a change makes them look bad when their search results are out of date. Especially if the failures are malicious, fitting in with the general theme of the article.

Should be easy to crosscheck the reliability of update notifications by doing a little bit of polling too.

drivebycomment · on Dec 23, 2023

https://developers.google.com/search/apis/indexing-api/v3/us...

heroprotagonist · on Dec 23, 2023

Or you could just delete it, if your content isn't valuable enough that you'll pay to have it served once a week without ad-dollars to subsidize it.

deno · on Dec 23, 2023

You can have millions of static pages and serve them very inexpensively. Showing dynamic ads is fundamentally exposing an expensive computational resource without any rate limiting. If that was any other API or service it would be gated but the assumption here is that this particular service will make more money than lose and it obviously breaks down in this instance. I really don’t think you can say it’s about scale when what you’re scaling (serving ads to bots) doesn’t make any business sense.

rented_mule · on Dec 23, 2023

Leaving the ads in was a business necessity because it eliminated the documented risk of being delisted by Google for customizing content for their crawlers. The company would have gone out of business if that happened permanently. Even if it only happened for a few days, it would have meant millions in lost revenue.

cyanydeez · on Dec 24, 2023

seems like it'd make more sense to just send your html to dedicated ports.

Retr0id · on Dec 23, 2023

Adds a lot of weight to the dead internet theory.

geraldhh · on Dec 23, 2023

tbh alot of this "theory" is common sense for internet natives.

"most of everything is shit" comes to mind, but "most of email being spam" and "most of web-traffic being porn" are well known.

syndicatedjelly · on Dec 23, 2023

can you elaborate?

pmontra · on Dec 23, 2023

I googled this: https://en.wikipedia.org/wiki/Dead_Internet_theory

Actually everything we discussed here is the result of genuine human activities.

syndicatedjelly · on Dec 23, 2023

I still think that humans are very good at identifying other humans, particularly through long-form speech and writing. Sentient and non-sentient beings alike are very good at identifying members of their own species.

I wonder if there's some sort of "time" threshold for how long an AI can speak/write before it is identifiable as an AI to a human. Some sort of Moore's law, but for AI recognizability

jgalt212 · on Dec 23, 2023

We have some very long tail content and experienced this in 2023 after all the VC funded LLM start-ups tried to scrape every page ever.

sidewndr46 · on Dec 23, 2023

The number I came up with last time I looked into this was about 60% of page requests are by bots on any normal website.

NicoJuicy · on Dec 23, 2023

Crawler hints and index now could reduce that traffic these days

- https://developers.cloudflare.com/cache/advanced-configurati...

- https://www.indexnow.org/

ixaxaar · on Dec 23, 2023

Why not just block bing and save 70% straight away? Nobody uses bing anyway.

usr1106 · on Dec 23, 2023

I have never used Bing. I use duckduckgo though and they buy their results from Bing. At least they did in the past, I don't follow them closely enough to necessarily notice every possible change.

Supermancho · on Dec 23, 2023

Specific Google searches are often useless, so I switch to Bing at work and home as needed.

keithalewis · on Dec 23, 2023

Nobody goes there anymore, it's too crowded.

fuzztester · on Dec 23, 2023

Nobody quotes Yogi Berra anymore, he's too popular.

nucleative · on Dec 23, 2023

Love finding a Yogi Berra quote in the wild

araes · on Dec 23, 2023

This seems very cannibalistic of their own business. That means somebody running Google or Microsoft (or really any web ads) only has a 10% chance to start with of getting served to an actual human (if they're not trying to block each other constantly).

And on the other side, that means every customer or ad placer, has to try and filter all the bots so people with actual credit cards and money will see the Google, TEMU, or FB ads (or others).

In some ways, almost feels like Microsoft is griefing online search by burying it under massive robot crawls. Like an ad DDOS.

bboygravity · on Dec 23, 2023

Or you could also NOT serve targetted ads?

tempestn · on Dec 23, 2023

They're serving first party targeted ads based on only their own data. If you're going to complain about that, it's close to saying that websites shouldn't be able to make money from advertising at all.

rented_mule · on Dec 23, 2023

Very much this. It's a site/app that has probably been used by 80-90% of adults living in America over the last decade. It would not exist if these ads weren't targeted. I know because we knew (past tense because I'm no longer there) exactly how much targeting increased click-through-rate and how that affected revenue.

On top of that, they were ads for doing more of what the user was doing right then, tailored to tastes we'd seen them exhibit over time. Our goal was that the ads should be relevant enough that they served as an exploration mechanism within the site/app. We didn't always do as well as we hoped there, but it was a lot better than what you see on most of the internet. And far less intrusive because they weren't random (i.e., un-targeted). I have run ad blockers plus used whole house DNS ad blocking as long as I've been aware of them, but I was fine working on these ads because it felt to me like ads done right.

If we can't even allow for ads done right, then vast swaths of the internet have to be pay-walled or disappear. One consequence of that... only the rich get to use most of the internet. That's already too true as it is, I don't want to see it go further.

skinkestek · on Dec 23, 2023

I have no problems with this (first party, targeted) as far as I can read English and understand.

In fact one of my bigger problems have been that Google has served me generic ads that are so misplaced they go far into attempted insult territory (shady dating sites, pay-to-win "strategy games" etc).

mcpackieh · on Dec 23, 2023

> websites shouldn't be able to make money from advertising at all.

This is the case. Advertising is a scourge, psychological warfare waged by corporations against our minds and wallets. Advertisers have no moral qualms, they will exploit any psychololgical weakness to shill products, no matter how harmful. Find a "market" of teenagers with social issues? Show them ads of happy young people frolicking with friends to make them buy your carbonated sugar water; never mind that your product will rot their teeth and make them fat. Advertisers don't care about whether products are actually good for people, all they care about is successful shilling.

Advertising is warfare waged by corporations against people and pretending otherwise makes you vulnerable to it. To fight back effectively we must use adblockers and advocate for advertising bans. If your website cannot exist without targeted advertising, then it is better for it to not exist.

tempestn · on Dec 23, 2023

Think about what it would mean to not have any advertising whatsoever. Most current large brands would essentially be entrenched forever. No matter how good a new product or service is, it's going to be almost impossible to reach a sustainable scale through purely organic growth starting from zero. Advertising in some form is necessary for an economy to function.

mcpackieh · on Dec 23, 2023

[flagged]

DougN7 · on Dec 24, 2023

Tell us how you really feel. :)

The problem is, as was mentioned above by someone, all content has to be paid for. If there were no ads we wouldn’t have had TV and radio for the past few decades. 90% of the internet would disappear, and the only stuff left would be paywalled - i.e. only the rich could use the web.

I’m sure you try to avoid ads - I do too, they suck. But don’t pretend you don’t use a lot of websites that are not paid for with ads.

hollerith · on Dec 24, 2023

The internet began in 1969 and by 1992 was by far the largest network of computers and had exactly zero ads and zero paywalls. (The US government imposed a rule against commercial use of the internet to appease private businesses that didn't want competition from the internet. The rule remained in force till 1992.)

Also, you're currently using a very large non-paywalled site with no ads.

So, no, ads are not needed to have a nice internet available to all.

DougN7 · on Dec 24, 2023

I don’t think you’re being intellectually honest. I didn’t have access to the Internet in my home in 1992, and the rest of the world didn’t either. I did pay for and have access to Compuserve forums. There was very little content back then. Certainly no huge video sites where you can learn practically anything, or hardly any of the good benefits we enjoy from being online today. If you loved the 1992 internet I can probably find an AOL disk to send you. And just because there is one ad free site we are both using hardly means the rest of the sites wouldn’t somehow disappear. YC is paid for by some rich folks who have made plenty of money that ultimately (though not exclusively) came from ads. Like it or not, ads are an economic necessity. If you have a better solution start a company that gives away free, valuable content and prove it.

hollerith · on Dec 24, 2023

>I don’t think you’re being intellectually honest.

Do you think I'm outright telling falsehoods? Which part do you think is false: that the internet had many millions of users in 1992? That the internet pre-1993 was completely non-commercial with absolutely zero ads and no paywalls?

hollerith · on Dec 24, 2023

1992 internet had email, mailing list, newsgroups, Internet Relay Chat, massively-multiplayer online games (called MUDs) and places (mostly using the "anonymous FTP" protocol) where you could download free software like Linux and GNU utilities.

>There was very little content back then.

The newsgroups were absolutely huge in 1992: if you spend all day every day reading newsgroups, you could keep up with less than 1% of it. The same could be said of Internet Relay Chat and probably also of mailing lists (though I didn't subscribe to enough mailing lists to say that with 100% confidence).

Just because you never had access to it in 1992 does not mean that it is irrelevant to the topic of our conversation. AOL users had limited access to the Internet in 1992. They could send email for example I think to non-AOL users over the Internet, and 1992 I think is the year that they gain access to the newsgroups (including famously the ability to post to the newsgroups). But if in 1992 all you knew was Compuserv and AOL, you didn't know the Internet.

And again, one of the few rules of the internet (imposed again by the US government, which was footing the bill) was no commercial use. So for example there was a newsgroup called ba.jobs (the "ba" stood for "bay area") where employers could advertise job openings and employees could make posts announcing their availability for a job. But contractors (i.e., 1099 workers as opposed to W2 workers) were prohibited from making such a post because that was considered too commercial (in that an individual contractor is a lot like a small business and for such a contractor to use the internet to announce his availability was too much like a small business posting an ad).

>I didn’t have access to the Internet in my home in 1992, and the rest of the world didn’t either.

In 1992, most users of the internet got their access from their employer or their school of higher education. You could've bought access for $20 a month in 1992, its just that the Internet was not being advertised, so you didn't know about it. (Also, if you were living in a rural area, you might've had to pay your telephone company long-distant charges for every minute you were connected.)

Actually, it is not just that the internet was not being advertised, the people running it actively discouraged journalists from writing about it because there was a senator named William Proxmire who was good at getting the press to repeat his accusations of governmental wasteful spending, and the internet was an easy target for Proxmire: there were for example academics of every department using the newsgroups to discuss ideas, and Proxmire could say (truthfully, but misleadingly) that the US government was spending taxpayer money so that professors could discuss <pick the most ridiculous things academics might discuss>. (Here's an example of a journalist losing his access to the internet in 1984 in part because he wouldn't stop writing about the internet (then called ARPANET): https://www.stormtiger.org/bob/humor/pournell/story.html)

So you see there was an availability bias at play in which advertising is loud and designed to get attention (of course) and it tends to drown out information that is not part of the advertising-dependent information-ecosystem. (And again, the people in charge of the infrastructure of the internet pre-1993 were even actively striving to avoid any publicity.) Particularly, hardly anyone knows nowadays that many millions of users were using the completely-noncommercial internet of 1969 - 1992. People tend to think that the internet was created in 1993 or that advertising-dependent companies were essential to its creation.

DougN7 · on Dec 24, 2023

I don’t think you’re taking scale into account. Millions of internet users then vs billions now makes a difference. Generous hobbyists and some universities payed for those services back then. The “massive” in MUD was a few thousand simultaneous players, with mostly text and maybe limited graphics. I very much doubt any of them could/would have paid if their usage went up by 10,000 times, with the higher quality and expectations that we have today. Again, I challenge you to come up with a service for a hundred million people that is open to everyone and doesn’t require ads. I hate ads too - I’ll join your service if you can make it work.

Just for reference, I was there too. I started with a shiny 300 baud modem. To compare the old days to today and say they’re even comparable in terms of information, media, knowledge, access, gaming, entertainment … it’s not even close.

hollerith · on Dec 24, 2023

Earlier you wrote that "I did pay for and have access to Compuserve forums", and that "if you loved the 1992 internet I can probably find an AOL disk to send you".

Could you clarify whether you had direct access to the internet (the newsgroups, email, ftp sites, web sites, not mediated by AOL or Compuserv) before mid-1993? Also, if yes, how many hours did you spend on it? I ask because I would be surprised to learn that it is possible for someone with your opinions to have had extensive experience with the internet pre-1993 (and I go looking for surprises).

DougN7 · on Dec 25, 2023

I remember seeing spyglass and using NCSA mosaic at work and school, and Compuserve from home. There was definitely stuff out there, I downloaded images, a song or two and some programs. I saw a very early version of (I think?) Windows 95 (or 3.1?) that could play different videos in different windows and was amazed (these were from disk, not the web). Used a sysadmin for a Netware network.

It was a really fun time. But the breadth of what we have now more than dwarfs what existed then. It’s not surprising - that was 30 years ago. I don’t see any way to get from there to here without a ton of money being spent. Some of it was spent by governments and individuals, but I’m guessing the bulk was by companies. Economic realities require those companies to get something for their investments - they’re not charities. Advertising is the major vehicle for that investment. I’ll bet we’d find radio and TV followed a similar historical trajectory.

I use uBlock and avoid ads because they’re irritating (and I feel like a hypocrit for doing it). I hate going to recipe sites for all the garbage you have to wade through to get to the recipe. So I get it. The web, at current scale, doesn’t and can’t exist outside of economic realities. Micro transactions might have been the solution but it wasn’t. Kagi has a great model (happy customer here), but everyone can’t afford to subscribe to everything.

mcpackieh · on Dec 24, 2023

> Tell us how you really feel. :)

Anybody who says that deserves a prompt smack. I told you how I feel you smarmy parasitic prick.

DougN7 · on Dec 24, 2023

> “if you all dropped dead”, “you smarmy parasitic prick”

Dude. I hope you’re just having a bad day. If this is your normal mode of discourse you should get some counseling. I say this from a place of good will advice.

dageshi · on Dec 23, 2023

I don't mind ads, they're not great, but they're not the end of the world either, they paid for a lot of useful services over the years.

rented_mule · on Dec 23, 2023

What's a viable business model for web search other than ads (Google, Bing, DuckDuckGo, Naver, etc.) or paid search (Kagi)? If paid search is the only option left, is it okay that poor people can't use the web? Is it okay if poor people don't get access to news?

Oh, and they don't get to vote because voting day and locations can't be advertised by the government, especially in targeted mailings that are personalized with your party affiliation and location. The US Postal Service will also collapse, so those mailings can't go out, even if allowed. At least the rich can still search for their polling location on the web [<- sarcasm].

None of that is okay with me. More/better regulation? Yes! But our world doesn't know how to function without ads. Being absolute about banning ads is unrealistic and takes focus away from achieving better regulation, thereby playing into the hands of the worst advertisers.

mcpackieh · on Dec 23, 2023

> What's a viable business model for web search other than ads (Google, Bing, DuckDuckGo, Naver, etc.) or paid search (Kagi)?

Not my problem. Those companies, and any other with business models reliant on advertising, don't have a right to exist. If your business can't be profitable without child labor, your business has no right to exist. This is no different.

CaptainOfCoit · on Dec 23, 2023

> Among other things, we ended up disabling almost all targeting and showing crawlers random ads that roughly fit the page

Seems like that's exactly what they did...

driverdan · on Dec 23, 2023

For everyone, not just crawlers.

bhartzer · on Dec 22, 2023

That 'policy' is still actually in effect, I believe, in Google's webmaster guidelines. They just don't enforce it.

Years ago (early 2000s) Google used to mostly crawl using Google-owned IPs, but they'd occasionally use Comcast or some other ISPs (partners) to crawl. If you were IP cloaking, you'd have to look out for those pesky non-Google IPs. I know, as I used to play that IP cloaking game back in the early 2000s, mostly using scripts from a service called "IP Delivery".

LoulouMonkey · on Dec 22, 2023

Not sure about now, but I worked in the T&S Webspam team (in Dublin, Ireland) until 2021, and we were very much enforcing cloaking.

It was, however, one of the most difficult types of spam to detect and penalise, at scale.

amluto · on Dec 23, 2023

Is it even well defined? On the one hand, there’s “cloaking,” which is forbidden. On the other hand, there’s “gating,” which is allowed, and seems to frequency consist of showing all manner of spammy stuff and requests for personal information in lieu of the indexed content. Are these really clearly different?

And then there’s whatever Pinterest does, which seems awfully like cloaking or bait-and-switch or something: you get a high ranked image search result, you click it, and the page you see is in no way relevant to the search or related to the image thumbnail you clicked.

bombcar · on Dec 23, 2023

Whatever Pinterest does should result in them being yeeted from all search engines, tbh.

LoulouMonkey · on Dec 23, 2023

Apologies for not responding quicker.

For context, my team wrote scripts to automate catching spam at scale.

Long story short, there are non spam-related reasons why one would want to have their website show different content to their users and to a bot. Say, adult content in countries where adult content is illegal. Or political views, in a similar context.

For this reason, most automated actions aren't built upon a single potential spam signal. I don't want to give too much detail, but here's a totally fictitious example for you:

* Having a website associated with keywords like "cheap" or "flash sale" isn't bad per say. But that might be seen as a first red flag

* Now having those aforementioned keywords, plus "Cartier" or "Vuitton" would be another red flag

* Add to this the fact that we see that this website changed owners recently, and used to SERP for different keywords, and that's another flag

=> 3 red flags, that's enough for some automation rule to me.

Again, this is a totally fictitious example, and in reality things are much more complex than this (plus I don't even think I understood or was exposed to all the ins and outs of spam detection while working there).

But cloaking on its own is kind of a risky space, as you'd get way too many false positives.

yreg · on Dec 23, 2023

I think they must be penalized, because I see this a lot less in the results than I used to.

And byw (unless we are talking about different things) it was possible to get to the image on target page, but it was walled off behind a log in.

mrwiseowl · on Dec 23, 2023

Do you have any example searches for the Pinterest results you're describing? I feel like I know what you're talking about but wondering what searches return this.

mrwiseowl · on Dec 22, 2023

Curious. How is it detected in the first place if not reported like in this case?

jiveturkey · on Dec 23, 2023

sampling from non-bot-IPs and non-bot UAs

bhartzer · on Dec 22, 2023

You can actually get a manual action (penalty) from Google if you do IP cloaking/redirects. It's still mentioned prominently in Google's Webmaster Guidelines: https://support.google.com/webmasters/answer/9044175?hl=en#z...

franze · on Dec 23, 2023

And then there is Dynamic Rendering which OKed cloaking

https://developers.google.com/search/docs/crawling-indexing/...

and the there are AMP pages which is Google Enforced cloaking...

londons_explore · on Dec 23, 2023

I think by now all search engines run JavaScript and index the rendered page...

gildas · on Dec 23, 2023

As the founder of SEO4Ajax, I can assure you that this is far from the case. Googlebot, for example, still has great difficulty indexing dynamically generated JavaScript content on the client side.

charrondev · on Dec 23, 2023

This isn’t about JavaScript vs no JavaScript.

It’s about serving different pages based on User Agent.

victorbjorklund · on Dec 22, 2023

They still have that rule. Just not always easy to spot spammers getting around it.

altdataseller · on Dec 22, 2023

I think they did this because lots of publishers show paywalls to people but still want their content indexed by Google. In other words, they want their cake and eat it too!

amadeuspagel · on Dec 22, 2023

And of course many of these publishers are politically powerful, and are the trusted sources that google wants to promote over random blogs.

travoc · on Dec 22, 2023

Well, they all show Google ads.

joe_the_user · on Dec 23, 2023

You'd think they could make fine money as neutral brokers since everyone served their ads and for a long period they did make money as semi-neutral brokers. But since, IDK, 2019 they have become more and more garbage. This is broadly part of the concentration of wealth and power you see everywhere else but I don't know the specifics but you can see the result.

amadeuspagel · on Dec 23, 2023

Not true for the NYT, which has its own ad system.

colordrops · on Dec 22, 2023

[flagged]

ithkuil · on Dec 22, 2023

Do you think there exists only one class of people with money and that they all have a unified narrative to feed us with?

colordrops · on Dec 22, 2023

No, and my statement doesn't necessarily imply that.

ithkuil · on Dec 22, 2023

ok; the use of the singular threw me off.

Another (related) question: do you think there are many points of view which aren't also subscribed (or exploited) by one or more moneyed classes?

colordrops · on Dec 23, 2023

I don't follow your question. What is your implication? Just come out and say it.

ithkuil · on Dec 23, 2023

Sure I have my viewpoint. But I'm also genuinely interested in your viewpoint.

My viewpoint is that I don't buy the idea that there is a group (or groups) of people that have both the means (money) and the ideas they made up themselves and they use the money to push the ideas to the passive masses who are then brainwashed by these rich people.

I think the masses produce the ideas. Those ideas are then selected and amplified by all sorts of people leveraging all sorts of means driven by all sorts of motives.

In fact there are plenty of examples of populist leaders that are not rich. The fact that the US has the cult of the millionaire sometimes obfuscates that fact; for some reason for populist leaders in the US to raise they have to be millionaire (or pretend to be) to begin with.

My point is that, sure, the moneyed class does play a role, but reality is much more complex than that and I don't really buy the idea that the world is "controlled" by a bunch of "supermen" who are both incredibly wealthy and also incredibly intelligent and play 4d chess.

I'm not sure you believe that, that's why I wanted to ask a question instead of implying anything for your position. But since you asked.

EDIT: typos

colordrops · on Dec 23, 2023

> I think the masses produce the ideas. Those ideas are then selected and amplified by all sorts of people leveraging all sorts of means driven by all sorts of motives.

> My point is that, sure, the moneyed class does play a role, but reality is much more complex than that and I don't really buy the idea that the world is "controlled" by a bunch of "supermen" who are both incredibly wealthy and also incredibly intelligent and play 4d chess

These don't contradict what I said at all. You are arguing with a straw man.

ithkuil · on Dec 23, 2023

That's why I asked you a question instead of implying anything but you forced me to just randomly speak my mind. I have no idea what you think

colordrops · on Dec 23, 2023

I'm willing to answer your questions, but I just didn't understand that last one. Anyway it sounds like we are probably in agreement. I recognize the world to be complex and that there are many parties with different interests. My point only was that Google is willing to support narrow and even inaccurate narratives at the behest of those willing to pay them lots of money.

yukkuri · on Dec 23, 2023

Just yell "NOT ALL RICH PEOPLE" and be done with it, bro.

ithkuil · on Dec 23, 2023

That's not what I'm saying. My intent is not to defend rich people. Yes obviously most of them don't spend their time controlling the media but instead spend time showing off on their yachts.

My point is something else: I don't but the idea that there are two factions, the rich and the poor, and that all rich people have the same interests and thus are allies and that all poor people have the same interests and are allied (or so they should).

Sure, this view is partially grounded in realty and that's why Marx did come up with it and it's why it stuck to this day as sensible to so many people.

But I don't think it's true. I think it oversimplifies reality to the point that a spherical cow in comparison is anatomically accurate.

But it's worse than just being wrong. It actively stifles conversation. Any attempt to have a nuanced conversation about these topics ultimately resolves in an accusation of "you're defending the rich, just admit it". That's what turns a idea into an ideology. Ideologies are ideas with built-in self-defense mechanisms.

franze · on Dec 23, 2023

there is a special spec for google by that

https://developers.google.com/search/docs/appearance/structu...

basically cloaking + json-ld markup

codetrotter · on Dec 23, 2023

I wonder if Google trains its AI on paywalled data, that other scrapers don’t have access to but which those paywalled sites give full access for the Google bot to.

fennecbutt · on Dec 26, 2023

The thing that annoys me most is that sites are allowed to use the http referrer from Google to see what you're searching for.

That + spam sites spamming as many keywords as they can just mean whatever you search for 95% of the sites are spam after the first page.

Idk why we've let the Internet get like this. There's gotta be a way to sign off on real/trusted content. That's certainly not ssl certs. Could probably crowd source the legitimacy rating of a site or something.

That's another reason why people flock to the big names, reddit, youtube, etc. It's like McDonald's, people know that what they get this time will be exactly what they got before.

xnx · on Dec 23, 2023

Why do you think that the rule is not in effect and that this is not an example of the constant cat and mouse game between Google and spammers?

flyinghamster · on Dec 22, 2023

> More recently, Google seems totally okay with sites that show content to Googlebot but go out of their way not to show that content to regular users.

See also, pages behind Red Hat and Oracle tech support paywalls.

hmottestad · on Dec 22, 2023

I’ve switched to Kagi a couple of months ago. Every once in a while I struggle to get good search results, but then I check Google and it’s not any better there. It’s not always the greatest at promoting the sites I like, but I’ve already started boosting and pinning various domains to tailor the results to my own preference.

Still using a lot of other google stuff including gmail and maps. Just not search anymore.

dilippkumar · on Dec 22, 2023

+1 for Kagi.

But also, for the past few months, I’ve completely stopped searching the internet. ChatGPT-4 does the job way more effectively and I don’t see why I would go back to searching the internet (assuming the chatgpt experience doesn’t get nerfed in some way).

hedora · on Dec 22, 2023

When kagi fails me, I use https://kagi.com/fastgpt

Much better than bing gpt in my experience.

arcastroe · on Dec 23, 2023

https://phind.com is good too

therealdrag0 · on Dec 24, 2023

Thanks! This was able to tell me the characters in a less known novel with GP link and ChatGpt3.5 were unable to do.

Though I asked it how to dry a comforter and it wrote me pseudo code lmao.

arcanemachiner · on Dec 23, 2023

Damn, that's worth noting. Bing Chat (or Copilot or whatever they're calling it now) is my gold standard for web-enhanced GPT.

dvaun · on Dec 22, 2023

Agreed. For getting a range of opinions Kagi is best for searching through forums, finding technical blogs, etc.

For quick intros to technical issues GPT4 gives a decent summary if the topic has been around for a while.

For going in-depth, though, I still rely on technical docs…

TriangleEdge · on Dec 23, 2023

I've been using ChatGPT and Perplexity with GPT4 as my replacement for DDG and Google. I never thought the day would come when I thought Google Search was going to be superseded. It's crazy times to me.

esperent · on Dec 23, 2023

If some company makes a less sanitized but equally capable version of GPT4 then I could see it replacing Google.

But for now you must keep your searches to a very limited, sanitized, corporate, non-copyright infringing, non-adult set of knowledge. And it's impossible to know what that will be beforehand, which makes using the tools very frustrating.

For example, try searching for anything medical related. Even if you're clear that you not looking for medical advice, you're just looking for info, it won't give it (sorry, as an AI I can't give medical advice). I imagine this is very frustrating for medical students.

And yes, I'm sure that I could coerce it into responding. Pretend your my grandmother telling me about her old medical recipes or some such. But that's still too annoying to do as anything except for testing the boundaries of the tool.

cgriswald · on Dec 23, 2023

Simply asking it, “What would you tell a medical student…” works around this. I can’t imagine how or why they’re bothering with this when a simple disclaimer would probably protect them legally. WebMD seems to do just fine.

ukuina · on Dec 23, 2023

If you contextualize your request (e.g., "this is hypothetical"), GPT4 answers quite a few medical questions.

esperent · on Dec 23, 2023

Yeah but my point is that it's not a complete search tool while it makes you jump through hoops of indirectly asking "hypothetical" questions.

It needs to directly answer all the questions that are given to it. If it must finish out the post with a disclaimer like "I am not a doctor/lawyer, this is not medical/legal advice" or whatever, that's fine.

te_chris · on Dec 22, 2023

ChatGPT is like living in an information tunnel. It’s amazing but it doesn’t replace search for me at all. Irony is when it does search, because it’s obviously just doing a quick crawl it actually makes things worse as it treats whatever shit it finds as authoritative - which, as anyone working on RAG knows, is a whole world of problems on its own.

This is absolutely not to say that Google can be considered ‘good’ these days.

graeme · on Dec 23, 2023

Yeah they merged browsing into ChatGPT and it worsened the experience. I dread seeing the browsing icon show up to a question as the ai will get dumber.

glenstein · on Dec 23, 2023

Exactly the same for me, I've found answers derived from chat GPT training to be way more useful than the browser search answers. Half the time it doesn't work, it ads a lot of waiting time and it provides answers way less comprehensive. I have used the custom "classic" GPT bot when I have wanted to avoid Bing search answers.

bboygravity · on Dec 23, 2023

When !OpenAI achieves AGI, Microsoft will just use it to drive more traffic to Bing and Azure.

That's creativity and innovation Microsoft style.

tempestn · on Dec 23, 2023

I've addressed this by adding a custom instruction to only search when necessary, when it can't give a good answer otherwise. Pretty much cuts the searching down to when I explicitly ask it to do so.

ukuina · on Dec 23, 2023

They also have an official "ChatGPT Classic" that turns off plugins including Bing.

spaceman_2020 · on Dec 23, 2023

Same for me. GPT-3.5 is good enough and pretty fast for most basic queries. GPT-4 is great at more detailed queries.

Any time I do end up going to Google, I’m so disappointed by the search results that I just leave. The only thing its good for now is searching site:reddit.com

calamari4065 · on Dec 22, 2023

I've been using kagi for almost a year, I think. Before that it was startpage or DDG.

I also have the same experience where kagi doesn't find something I think it should, so I go try google. Holy hell is google bad. Shockingly bad. I genuinely can't believe how bad it is now compared to 10 years ago.

ykonstant · on Dec 23, 2023

I have essentially given up on Google as a search engine; I use it as bookmark search for bookmarks I consistently fail to add to my list. 99% of the time I google something for a specific URL from a domain I know (wikipedia, arxiv. etc). The other 1% was Google searches, but now I just append :reddit.com from the get go, bringing down my genuine "Google search"es to approximately zero.

stjohnswarts · on Dec 23, 2023

even a couple years ago. There is soooooo much AI generate trash nowadays like https://www.oggyboggydoop.com on the first page. How can google's SEO filtering be so bad?

atorodius · on Dec 23, 2023

Do you have example queries where Kagi shines? Just curious

calamari4065 · on Dec 23, 2023

No, not really.

Kagi is designed to show you what you ask for, and not for showing you the ads you're most likely to fall for. It simply takes your query and returns results matching it. That's really it. It's sad that "does what you ask" is a defining feature, but that's what kagi is.

They have advanced features like "lenses" that bias your results toward a specific topic like programming, research, forums. It also lets you add weight to certain domains. For example, I have Pinterest and Facebook totally blacklisted, and some small sites boosted in my results.

They also support advanced query syntax with double quotes, +/-, and other operators.

Which is to say, kagi is the standard for "search engine that works". Nobody else sells a search engine that just works and does what you tell it to, which is literally all I want out of a search engine. It's mundane and unsexy, but it works, it doesn't advertise at me, and it lets me get my work done faster.

skinkestek · on Dec 23, 2023

- Any time you are searching for something that can be misunderstood.

- Especially if you feel the need to use doublequotes.

breakingcups · on Dec 22, 2023

I've trialed Kagi but it's not given me the best results, especially not in my native language or related to my local area. I still prefer to use Google or even DDG.

maronato · on Dec 23, 2023

You can select the region the search applies to in Kagi, and that will bias the results towards pages from that language/country.

But yes, it definitely misses some results that Google doesn’t in that regard.

That’s probably different in other regions, but it’s been my experience with pt-br

m3nu · on Dec 23, 2023

Yes, it's slightly worse for local results. But pretty good for most other searches.

freediver · on Dec 22, 2023

What language is that?

SkyPuncher · on Dec 22, 2023

I’ve noticed an increasing number of sites that provide an insane amount of text to answer a simple question.

They almost always follow some sort of structure where the actual answer to your question is all the way at the bottom of the page.

Most of the content appears relevant on the surface, but when you actually read it, it’s completely generic junk. Stuff that a high schooler would use to fluff an essay to hit a word limit.

spaceman_2020 · on Dec 23, 2023

Blame Google. A few years back, they decided that “topical authority” was important, and a page that targeted as many keywords as possible was “better”. A bunch of SEOs published studies showing how pages with 2,000+ keywords ranked higher, and then the floodgates opened with every company fluffing up their pages with 2000 words of BS just to appeal to Google.

gerdesj · on Dec 22, 2023

"They almost always follow some sort of structure where the actual answer to your question is all the way at the bottom of the page."

You have to scroll past the adverts - that's why they exist. These sites are generated from templates.

politelemon · on Dec 23, 2023

Many don't even have an answer. They simply conclude that they don't know, after having spent pointless paragraphs on filler. "Well there you have it" is a common expression I see.

hnbad · on Dec 22, 2023

Lately a lot of these seem to be actually generated with LLMs. You can usually tell from the high school essay structure, often ending with a paragraph along the lines of "in conclusion there are many benefits but also many drawbacks".

hedora · on Dec 23, 2023

Those sites predate llm’s. I often wonder how much better things like chatgpt would perform if their training data did not include seo spam.

xp84 · on Dec 22, 2023

After 20 years of being taken for granted that search engines as we know them are equipped to solve the typical problems we throw at them, I wonder if the whole concept of an unsupervised web crawl as the input to a single purpose search engine will just die out.

When I think about my typical web queries across the past year or two, it seems more and more likely that I'd be better off replacing Google with several purpose-built systems, none of which search the "entire web" (whatever that even means anymore). Technical queries? Just search StackOverflow and Github directly. Searching for a local venue of any kind? Search against a dedicated places database where new entries have to pass at least a cursory scrutiny. (Arguably Google Maps or Yelp already serve this purpose today, but I'm not sure if they have enough vetting today). Medical question? Search across a few sites known to be trustworthy.

We have become accustomed to go to Google because it's more convenient to type in a movie title, "chinese restaurant philadelphia", "flights to miami 4/12/24" or "Error code 127 python" into the same single place, but something tells me we'd be better off if that one place made some LLM-assisted guesses of what kind of search it is, and then went to a specialized search that is curated. If we go back toward the DMOZ/Yahoo model of directories that humans curate, I wonder if we could even reverse the trend toward spam and clickbait that has been so lamented in recent years.

JKCalhoun · on Dec 23, 2023

For me search would be greatly improved if I could selectively exclude entire domains when I come across them. I want to be able to, with one click, remove GeeksForGeeks from all my search results — forever. And then I want to be able to continue to add to that (once called) "black list".

Never, ever show me Pinterest when I do an image search.

I imagine my search results would improve quickly in short order.

Better still, aggregate those lists from all users and you can improve search for users that have not yet built up a black list.

supriyo-biswas · on Dec 23, 2023

> I want to be able to, with one click, remove GeeksForGeeks from all my search results — forever

Try https://github.com/iorate/ublacklist, otherwise Kagi also has a similar feature.

wanderingmind · on Dec 23, 2023

I would rather not install yet another extension. Its easier to just add a filter page to uBlock Origin using something like this

https://github.com/quenhus/uBlock-Origin-dev-filter

tentacleuno · on Dec 23, 2023

I'm fine with installing it. It's definitely worth it to clean up results. Seems like ~220,000 users think the same.

tomjen3 · on Dec 23, 2023

That’s literally built into kagi. For me their rewrite feature is more important - I can now go directly to old reddit.

themoonisachees · on Dec 22, 2023

On the surface this is a good idea, however this would turn wildly anticompetitive. Whether or not your site would have business on the web would be entirely dictated by whether or not you woul be correctly classified (or indeed classified at all) in this engine. if you wanted to start your own stackoverflow competitor for whatever reason, you would have a very hard time getting any traction. this is also true of current general purpose engines, but you still do stand a chance to be referenced well and hit high enough to still get traffic.

the yahoo model collapsed for this very reason. back when you went to more than 5 websites to look at screenshots of the other 4, the directories would not necessarly show you the latest thing, because it wasn't on the list of sites manually added to each directory.

i think the current problem with google isn't to do with spam. i think google has become complacent because their ads are on all the sites anyway, so the function of "maximize revenue per search" doesn't actually care if you find what you're looking for, because you will get shown google ads anyway, and will be coming back to google anyway. in fact, they probably get to show more ads by feeding you bad results, because then you're loading more pages. this didn't used to be the case when google search was on top of spam sites, but it doesn't feel like they're doing anymore algo updates to curb the current trend, and spam sites have caught on to what ranks higher in the results.

xp84 · on Dec 28, 2023

> if you wanted to start your own stackoverflow competitor for whatever reason, you would have a very hard time getting any traction. this is also true of current general purpose engines, but you still do stand a chance to be referenced well and hit high enough to still get traffic

Hmm... You started to backpedal but then persisted. In the today world, your SO competitor would have that (slim) chance to rank if you started getting links from sites like HN or from people on Twitter who matter and know about tech. This would give you some PageRank and then you'd start possibly ranking in Google (in theory. In reality, no you probably wouldn't rank for anything since you're competing with 1,000,000 spam sites including whole verbatim clones of every page on SO that Google can't even get under control)

If any directory would be worth using, it would be run by humans who would HAVE to look at each submission. They could also look at who's linking to it, and evaluate "Is this a backlink from like, a gibberish page on `prawns-01-blork.info` or from like, Joel Spolsky's Twitter account?" Yes, it would take a lot of work, but like, it would be creating a truly useful product that people might pay for. And we have examples of other professions where "just rubber stamp everyone who pays" is frowned upon, like building inspectors and journalists. It's a hard problem, but it's far from hopeless.

bantunes · on Dec 23, 2023

By limiting web search results to "a few known sites", you'd be expediting the death of parts of it.

The beauty of search engines (in theory) is that you can find something NEW. Keeping the "open web" out would just entrench and ossify the current players.

xp84 · on Dec 28, 2023

A directory wouldn't be there to exclude anyone actually producing content of worth. It would serve as a gatekeeper to keep out plagiarism, spam, and utter trash. And people could create networks of sites based on their own real-world webs of trust, which vouch for one another.

Personally I'd rather see a standard which allowed you to add as many directories as you wanted to what your search engine would metasearch across. This also avoids the political problem of "who decides what's trash" -- if you want to add a directory whose main deal is they'll add literally any site, you could. If you want to only add directories which don't allow any <insert hated party> leaning content, you could do that.

ThinkBeat · on Dec 23, 2023

I have noticed that the quality of responses I get from Google searches appear to have been in steep decline for a long time.

I have seen others comment on this as well.

I cant say I know when the trend started.

Could this have been running / going on for a long while without getting the scrutiny, it needed?

Is this spam attack the final act?

fluidcruft · on Dec 23, 2023

For me it started when they began to monkey with the search entries and "second guess" what I was searching for. My guess is that they lost track of unvarnished human interaction and things snowball from there. That's just my hunch. People gave up trying to actually rely on it. We've all learned that Google doesn't care.

Basically it used to be optimized to be a sharp knife but now it's optimized to be a safety knife.

maronato · on Dec 23, 2023

IMO that particular aspect is what made Google stand out to the general audience. Most people don’t search by keywords, or optimize their queries to hint the engine. They just type a question and assume Google will make sense of what they meant. Google usually does that very well, and that high level understanding is not compatible with our keyword query expectations. To us that feels broken, but to the majority of people it’s working as designed.

Instead I think search results have been getting worse for everyone because of SEO. Companies want to optimize for number of ads viewed, not quality of content, thus quality goes down in favor of clickbait and keyword stuffing.

I don’t think there’s much Google can do here to resolve this issue. It’ll always be a game of cat and mouse between Google and companies using SEO to push more ads for less money.

crotchfire · on Dec 23, 2023

More like a spork, actually.

lmpdev · on Dec 23, 2023

Can we attribute the decline in quality to the decline in informative content proportional to garbage content?

Garbage content seems to be making massive gains year on year, while informative or high quality content has stagnated or even declined from data decay

AlexandrB · on Dec 23, 2023

I think part of the problem is that Google has a recency bias. Newer, more spammy, sources get priority over older ones - even if the older ones are of higher quality.

astrea · on Dec 23, 2023

I feel like it’s been in decline ever since the introduction of Hummingbird

yetanother12345 · on Dec 22, 2023

I would have thought it well known by Search Engine Journal that at intervals Google implements changes that in one way or the other influences what kind of sites are rated in which way... And that, at times of change this may lead to very substantial changes in ranking, sometimes letting a lot of "irrelevant"/"lower quality" (...all this is subjective to some extent) results flow to the top for certain queries, even for a prolonged period of time... Back in the day these algo updates were quite the thing to monitor and discuss on certain SEO-related sites...

That said I only comment out of casual interest as I stopped using Google more than a decade ago.

martinibuster · on Dec 22, 2023

This has nothing to do with changes in the algorithm. I've been in search for 20+ years, so I'm quite familiar with how Google works. ;) My article explains why it's likely happening.

TL/DR is that spammers are likely exploiting two loopholes. 1. Longtail keywords are low competition and may trigger different algorithms. 2. Some/many of the search queries the spam ranks for trigger the more permissive Local Search algorithm

Plus there are other reasons why those sites are getting through, which are discussed in detail in the article.

lolinder · on Dec 23, 2023

> My article explains why it's likely happening.

Are you saying you're the author of TFA?

tech234a · on Dec 23, 2023

Yes. If you roll over the author at the top of the article, there is a link to various other social media profiles where the author has used this username.

nvr219 · on Dec 22, 2023

What do you use now for all the various things? I cut google out for most things but still use their search from time to time (and of course have to use google docs but that's because the people I'm collaborating with are on google)

calamari4065 · on Dec 22, 2023

Kagi.

Startpage is a good wrapper for Google if you care about privacy. (Or it was, I haven't checked in many years)

DDG is worse than google, IMO. Bing works, I guess, but I trust Microsoft almost as much as I trust google.

At this point I've given up on Google. If I can't find it in kagi after a bit of effort, I'll either work around it or ask a person I know in the given field.

God, the internet sucks so much now. I miss the early 2000s :(

hedora · on Dec 23, 2023

I’ve found ddg is much better than google, but switched to kagi, which is better still.

For queries that fail on kagi, I check ddg and google, but it never helps.

Kagi’s FastGPT works well on queries where the search engines fail, but is worse at search on average.

calamari4065 · on Dec 23, 2023

DDG has been much more aggressive about replacing my query with whatever their algorithm decides I meant. They support no advanced query operators, not even double quotes.

A search engine that ignores my query and shows me something else is not super valuable to me

partiallypro · on Dec 22, 2023

Google has become unusable in many circumstances, and it rewards spam. It claims it doesn't, but it does, a lot of SEO strategies now revolve around spamming the search engine with articles, pages, etc. Not for useful content, but for linkbacks, internal linking, etc. It is especially bad for geo-specific SEO strategies, where you're trying to have different page sets for different regions. Basically, how it was when Google first started and was easily gamed. Now people are spinning up 100s of pages and articles using AI and just spamming it. It has gotten bad, but the worst part is that you have to do it now in order to compete for keywords.

whstl · on Dec 23, 2023

Yeah. Ten years ago Google was fighting those strategies involving content farms and abusive SEO with things like the Panda update, etc. Now it seems they don't care anymore. Since low-quality SEO ranks over legitimate sites, it forces those sites to pay for advertisement. This is very sad.

franze · on Dec 23, 2023

Google has give up on organic search. I am pretty good as that SEO stuff and I can't Google anymore as I can fathom really fast why a page ranks the way it does. And non of it has to do with accurate valuable information.

Google is a marketplace, and they let most "engaging" results that adhere to a certain content structure win.

By now most paid results offer more value for the users than the organic results. Cause thats what Google wants. Click the paid, ignore the crap.

yukkuri · on Dec 23, 2023

Yeah that's not how I feel today when I searched for a local buffet by name and accidentally tapped the first, sponsored result which was Golden Corral, yuck. Definitely not "more value" to me

sgjohnson · on Dec 23, 2023

> Google is a marketplace

No. Google is a glorified advertising agency who just wants to make the absolute maximum profit for the absolute minimum amount of work.

> By now most paid results offer more value for the users than the organic results. Cause thats what Google wants. Click the paid, ignore the crap.

Your first paragraph implies that the paid results are crap too?

oglop · on Dec 22, 2023

I stopped using google and switched to Bing about a year ago after they started doing more with ChatGPT. For the most part I’m much happier with how it presents what I’m looking for. It’s not perfect, but when I compare to google the few times I’ve been frustrated, it’s not any better and has to do with the topic, not search engine.

jen729w · on Dec 22, 2023

I’ve been a DDG user for years now, so I guess a bunch of my results come from Bing.

I don’t generally compare to Google, so I can’t say for sure that the results are ‘as good’, but my experience sure as shit is better.

I search for a thing and I get a page with links. Usually the thing I want is in the first page.

Sometimes it isn’t, or I’m searching for something that I know is recent that Google probably has a later version of, so I just add the !g to the search and there I am at Google.

It’s great. It works. It’s not stressful or horrible or annoying. I recommend it.

rumblestrut · on Dec 22, 2023

Over the past few years I’ve felt like Google itself is a spam attack.

nubinetwork · on Dec 22, 2023

This. Google ads are spam, Google search is spam, Gmail is spam, YouTube is spam...

Google gets their money. They don't care about anything else.

thephyber · on Dec 22, 2023

Baby: bath water.

You are playing up the noise and playing down the signal. Search, Gmail, and YouTube are far from spam. There are obviously many scenarios/URLs that contain spam, but all 3 of those products are overwhelmingly useful.

telepathy · on Dec 23, 2023

When "SEO" first became a trendy buzzword decades ago, I immediately thought "these are fraudulent efforts to artificially appear more relevant than you actually are to the search algo" - and then it became a multi billion dollar industry.

ChuckMcM · on Dec 23, 2023

Wow, that's pretty impressive. I didn't experience a "Code Red" when I was at Google but this would certainly qualify. Bing is not affected so it is definitely something to do with rank injection. I am really really interested in the post mortem if it sees the light of day, although if it reveals exploits for ranking it will probably not be made public.

qingcharles · on Dec 22, 2023

This reminds me of what happens when I search "coupon" on Twitter. I would love someone to explain why you just get a thousand pages of this:

https://imgur.com/a/uzNNmuw

missblit · on Dec 23, 2023

Well the text is HTML entities to escape ASCII text. "People being end with a person. Like everything in OUNASS ProMo CODE Onas OuNaS oNass cOuPOn DiscOUnt NoON SiVvl non toyou NaMshi"

Here Sivvi, toyou, Namshi, noon, and OUNASS are all brands of shopping websites and you can see their logos in the image.

Clearly this is some sort of keyword spam, though it's hard to tell more than that from your screenshot. It's also not clear why they'd bother to use HTML entities... a bug in the spam code? Or perhaps exploiting some parser differential between different twitter systems? Who can say.

missblit · on Dec 23, 2023

So this is pure speculation, but more people should be aware of parser differentials (same thing as that email thing the other day) so let me say what I mean...

Hypothetically say a website has an internal service to index posts for keywords for search, that just so happens to unescape HTML entities during keyword normalization due to a seemingly harmless bug.

Plus a second internal service to identify keyword spam that _doesn't_ do any HTML entity unescaping (because why would you?)

Then you could end up in a situation where a spammer uses HTML entities to avoid spam detection while still showing up in search results. They hope that the user ignores the nonsense text and just clicks their link based on the image (a list of big shopping brands in the middle east) instead.

bombcar · on Dec 23, 2023

Looks like it might be spam in a different character set or something.

qingcharles · on Dec 23, 2023

The images are bizarre though too...

dfgfek · on Dec 23, 2023

That image in your screenshot looks like coupon codes for Arabic web shopping sites.

nhggfu · on Dec 23, 2023

google maps has been plagued with casino spam for a long time, and it seems to still be the case.

eg (November 2022) https://usa-casino.com/casino-news/spammers-take-over-google...

and right now I just tried a query from the article above "Alabama Casinos" on Google maps, and sure enough I see "Bovada casino" (offshore "illegal in US" casino) affiliate links in the third spot in the "locations" list.

yuk.

cmcconomy · on Dec 22, 2023

Recently almost every YouTube ad I get is a poorly made deepfake of elon musk telling me about an amazing secret investment opportunity. Google is losing the plot

sadhorse · on Dec 22, 2023

Most ads I see on YouTube are very dubious and poorly made. A reasonable ammount are plain fraud. It is very frustrating going to "ad center" and reading about how the advertiser did not verify his identity.

bombcar · on Dec 23, 2023

Lucky, those sound amusing.

All I get is the same damn advertisement for the Internet provider I already have. You’d think they’d be able to say “Google don’t send this ad to our customers.”

otikik · on Dec 22, 2023

I wouldn’t be surprised if there was some kind of “annoyance” metric for the ad selection algorithm. The idea being: show a certain demographic the same annoying ad again and again, and they might end up paying just to get rid of ads. I’m pretty sure Spotify does this.

dylan604 · on Dec 22, 2023

s/is losing/has lost/

visarga · on Dec 22, 2023

I've been using phind.com, very good results

install with https://www.phind.com/search?q=%s

LanzVonL · on Dec 23, 2023

A couple years ago I created a new gmail, with a very long address. Didn't tell anybody, sign up with ANYTHING. Just parked it and let it sit dormant. Within a DAY I had my first spam.

Google is an ad company and it's entirely reasonable to assume they are simply selling lists of their own email addresses. Same with Yahoo! Mail! Which! I! Ran! The! Same! Experiment! From! With! The! Same! Results!

KomoD · on Dec 23, 2023

And do you have any proof of that?

LanzVonL · on Dec 24, 2023

Ample but because it was one of the gmails I could create with an invitation generated by my own account, this was before you needed ID or anything but a link BTW, it would reveal my true life name which is my "mother" gmail address. I am one of those ass holes who got firstname.lastname@gmail.com as their address. I had a guy offer me $15,000 for my gmail.

jongjong · on Dec 22, 2023

Google should be prioritizing sites which have fewer backlinks as it's proof that they didn't cheat and are likely higher quality. Or just random ranking; only require a certain low threshold of backlinks to establish baseline relevance.

kredd · on Dec 22, 2023

I mean, when we get public posts like this one where someone boasts how they’re spamming to get hits through SEO — https://x.com/jakezward/status/1728032634037567509, I can only imagine it’s happening in larger quantities behind the scenes.

bhartzer · on Dec 22, 2023

Google actually shut that down pretty quick. That 'loophole' doesn't exist anymore.

Pro Tip: if you're going to boast on how you're spamming Google, then expect it to be shut down, especially if it's a hole in their algorithm.

mrwiseowl · on Dec 22, 2023

Did they really close the 'loophole' though? Jake said on Twitter they were actually hit with a manual penalty. So doesn't seem like it's patched. Seems like if he didn't boast about it they'd still be doing well.

kredd · on Dec 22, 2023

I’m actually curious, how did Google do that? The guy who did it did it in a very obvious way, but I’m assuming you can just schedule a lot of posts that would drop once a day, make the AI to use different language structures and change the underlying AI model in general (e.g. switch between OpenAI, Mistral and whatever) and slow drip submit the posts. How would Google know they’re “mass generated”?

mrwiseowl · on Dec 22, 2023

The original poster of that Tweet (Jake) admitted they got a manual penalty. Also, clearly Google didn't fix it because if not Google wouldn't be 'overwhelmed' with this current spam attack going on. If you look into the attack it's mass generated absolute spam garbage pages on hundreds if not thousands of separate domains. So it is definitely not fixed.

sgjohnson · on Dec 23, 2023

To be honest, why would they care?

If anything, this will help them sell more ad spots on Google Search.

weare138 · on Dec 23, 2023

It's gotten to the point we just need an open non-profit search engine.

mkaszkowiak · on Dec 23, 2023

Meanwhile, my blog got some of its posts de-indexed in the latest Google update, despite trying to write high quality content.

jongjong · on Dec 22, 2023

The fact that Google puts so much focus on content makes it sound like the Google algorithm has been corrupted. It's almost like a secret society where you need to do a secret handshake in order to get access. In this case you need to know the right word patterns to use in your content to get access to high traffic.

VladimirGolovin · on Dec 23, 2023

Whatever. Not my problem anymore. I switched to Kagi + ChatGPT 4 and couldn't be happier -- though it costs me about $30 per month.

I haven't tried Kagi's FastGPT yet. How does it compare with ChatGPT 4? Is it updated regularly? What is its knowledge cutoff date?

tomjen3 · on Dec 23, 2023

It’s useless. It’s much worse than a Google search.

ryukoposting · on Dec 22, 2023

What's sad is that I didn't even notice.

seydor · on Dec 22, 2023

youtube search is problematic. i very often get videos on top that are auto-generate spam with AI narration and random clips (often completely irrelevant to the subject). there are videos i know that aren't shown even when i search with title and channel name.

Google wants to be too relevant , to the point it s unusable

nubinetwork · on Dec 22, 2023

Roel Van De Paar broke it several years ago. https://you.tube/Olkb7fYSyiI

sadhorse · on Dec 22, 2023

Also recommend. Others liked. Recommend for you. Shorts section.

Search is unusable.

twisteriffic · on Dec 22, 2023

Recently getting pimple poppers and onlyfans when searching for small engine repair.

It's better than the Rogan and IDW spam I was getting a few weeks ago.

rightbyte · on Dec 22, 2023

> Recently getting pimple poppers

Ye for like two weeks. Searching for Lady Gaga or whatever, get those disgusting video thumb nails in the result.

It is impossible to not look at the thumb nails since they are so disgusting. I wonder if that feeds the ranking somehow?

bombcar · on Dec 23, 2023

Exactly why they work. It’s all about the almighty ad dollar.

toasted-subs · on Dec 23, 2023

It's probably really hard to manage those things. Best of luck.

paulpauper · on Dec 22, 2023

this has been going on for years. Google makes money with ads. the quality of search results is almost irrelevant to its business model

charcircuit · on Dec 23, 2023

No? Decreased search quality will reduce the number of searches people are doing which redcues the amount of ads Google can show. Decreased searches can come from not only people switching to another search engine, but also from people using the web less. Why look up something on the web when you can watch TikTok for a few hours?

sgjohnson · on Dec 23, 2023

> No? Decreased search quality will reduce the number of searches people are doing which redcues the amount of ads Google can show.

I’m willing to bet that >90% of Google Search users aren’t even aware that alternatives exist.

They are not suddenly going to stop using Google Search. There might even be a significant short-term increase in usage, for example, if they really need to go to page 5 to find the first relevant result.

charcircuit · on Dec 23, 2023

>They are not suddenly going to stop using Google Search

They will use it less if it becomes a waste of time to try.

ykonstant · on Dec 23, 2023

That is very far from my experience with non-tech people. My friends and students always google something, and almost inevitably fail to find it. Then they do it again, and again, and again.

Over two years now I see specifically two people deterministically failing to find stuff through G search, and yet they still start with it in an almost Pavlovian ritual. Sometimes they will desperately scroll down, click on a blatantly irrelevant result and switch to facebook or some other bookmark aggregator of theirs to continue their brutally inefficient search process, thumb flick after thumb flick after thumb flick.