More

bhartzer · on Dec 22, 2023

That 'policy' is still actually in effect, I believe, in Google's webmaster guidelines. They just don't enforce it.

Years ago (early 2000s) Google used to mostly crawl using Google-owned IPs, but they'd occasionally use Comcast or some other ISPs (partners) to crawl. If you were IP cloaking, you'd have to look out for those pesky non-Google IPs. I know, as I used to play that IP cloaking game back in the early 2000s, mostly using scripts from a service called "IP Delivery".

LoulouMonkey · on Dec 22, 2023

Not sure about now, but I worked in the T&S Webspam team (in Dublin, Ireland) until 2021, and we were very much enforcing cloaking.

It was, however, one of the most difficult types of spam to detect and penalise, at scale.

amluto · on Dec 23, 2023

Is it even well defined? On the one hand, there’s “cloaking,” which is forbidden. On the other hand, there’s “gating,” which is allowed, and seems to frequency consist of showing all manner of spammy stuff and requests for personal information in lieu of the indexed content. Are these really clearly different?

And then there’s whatever Pinterest does, which seems awfully like cloaking or bait-and-switch or something: you get a high ranked image search result, you click it, and the page you see is in no way relevant to the search or related to the image thumbnail you clicked.

bombcar · on Dec 23, 2023

Whatever Pinterest does should result in them being yeeted from all search engines, tbh.

LoulouMonkey · on Dec 23, 2023

Apologies for not responding quicker.

For context, my team wrote scripts to automate catching spam at scale.

Long story short, there are non spam-related reasons why one would want to have their website show different content to their users and to a bot. Say, adult content in countries where adult content is illegal. Or political views, in a similar context.

For this reason, most automated actions aren't built upon a single potential spam signal. I don't want to give too much detail, but here's a totally fictitious example for you:

* Having a website associated with keywords like "cheap" or "flash sale" isn't bad per say. But that might be seen as a first red flag

* Now having those aforementioned keywords, plus "Cartier" or "Vuitton" would be another red flag

* Add to this the fact that we see that this website changed owners recently, and used to SERP for different keywords, and that's another flag

=> 3 red flags, that's enough for some automation rule to me.

Again, this is a totally fictitious example, and in reality things are much more complex than this (plus I don't even think I understood or was exposed to all the ins and outs of spam detection while working there).

But cloaking on its own is kind of a risky space, as you'd get way too many false positives.

yreg · on Dec 23, 2023

I think they must be penalized, because I see this a lot less in the results than I used to.

And byw (unless we are talking about different things) it was possible to get to the image on target page, but it was walled off behind a log in.

mrwiseowl · on Dec 23, 2023

Do you have any example searches for the Pinterest results you're describing? I feel like I know what you're talking about but wondering what searches return this.

mrwiseowl · on Dec 22, 2023

Curious. How is it detected in the first place if not reported like in this case?

jiveturkey · on Dec 23, 2023

sampling from non-bot-IPs and non-bot UAs

bhartzer · on Dec 22, 2023

You can actually get a manual action (penalty) from Google if you do IP cloaking/redirects. It's still mentioned prominently in Google's Webmaster Guidelines: https://support.google.com/webmasters/answer/9044175?hl=en#z...

franze · on Dec 23, 2023

And then there is Dynamic Rendering which OKed cloaking

https://developers.google.com/search/docs/crawling-indexing/...

and the there are AMP pages which is Google Enforced cloaking...

londons_explore · on Dec 23, 2023

I think by now all search engines run JavaScript and index the rendered page...

gildas · on Dec 23, 2023

As the founder of SEO4Ajax, I can assure you that this is far from the case. Googlebot, for example, still has great difficulty indexing dynamically generated JavaScript content on the client side.

charrondev · on Dec 23, 2023

This isn’t about JavaScript vs no JavaScript.

It’s about serving different pages based on User Agent.

bhartzer · on Dec 22, 2023

Google actually shut that down pretty quick. That 'loophole' doesn't exist anymore.

Pro Tip: if you're going to boast on how you're spamming Google, then expect it to be shut down, especially if it's a hole in their algorithm.

mrwiseowl · on Dec 22, 2023

Did they really close the 'loophole' though? Jake said on Twitter they were actually hit with a manual penalty. So doesn't seem like it's patched. Seems like if he didn't boast about it they'd still be doing well.

kredd · on Dec 22, 2023

I’m actually curious, how did Google do that? The guy who did it did it in a very obvious way, but I’m assuming you can just schedule a lot of posts that would drop once a day, make the AI to use different language structures and change the underlying AI model in general (e.g. switch between OpenAI, Mistral and whatever) and slow drip submit the posts. How would Google know they’re “mass generated”?

mrwiseowl · on Dec 22, 2023

The original poster of that Tweet (Jake) admitted they got a manual penalty. Also, clearly Google didn't fix it because if not Google wouldn't be 'overwhelmed' with this current spam attack going on. If you look into the attack it's mass generated absolute spam garbage pages on hundreds if not thousands of separate domains. So it is definitely not fixed.

bhartzer · on Nov 1, 2023

I get what you're saying, it's weird to interview him as a Google employee. But actually it would be really weird if they didn't include Danny Sullivan in the article in some capacity. Danny Sullivan, over the years, has been so influential and such an influential voice when it comes to Search and SEO. He previously was on the other side, not working for Google.

bhartzer · on Sept 7, 2023

I own the domain of my last name. Several family members use (firstname@lastname.com).

I once went to get a new phone at Best Buy, and the employee needed my email address. I gave it to here (firstname@lastname.com) and she insisted that it was NOT my email address. She insisted that it MUST end in @gmail.com or @yahoo.com, something like that.

We frequently sign up for stuff online, and when we enter our email address it won't let us sign up... we figured it is because the email address is too similar to our actual name, the name we've entered in the 'first name' and 'last name' fields (it happens to both me and my wife at least 2-3 times a year).

alias_neo · on Sept 7, 2023

I have the same, firstname@lastname.com/uk/.co.uk/etc; my family name alone is an absolute pain in the arse for most British English speakers to spell when given it verbally; to make matters worse, when I give people my email, over the phone for example, I get the combination of "what's it @?" and then when they finally get there, that my last name is after the @, another 5 minutes to get them to spell it correctly; Some, despite this dance still end up never getting it right.

My wife constantly (half-jokingly) reminds me of how much of an PITA I've caused her with my name (that she took), when her maiden name was so sophisticated and easy compared to my weird, unidentifiable, "foreign" (I'm British/English) one.

EDIT to add: I don't often have issues with forms, but I reserve that particular address for "important" family related things, the sort of account where I _know_ if I receive an email to it, I need to read it. Everything else I use a gmail for (as does my wife).

bandergirl · on Sept 7, 2023

> her maiden name was so sophisticated and easy

You could have solved the problem at the root by taking her name

comradesmith · on Sept 7, 2023

There is still time for it too

sdflhasjd · on Sept 7, 2023

Whereas I made the opposite mistake of having firstname@outlook.com

I get ungodly amounts of spam, relentlessly, from everyone. Because anyone over the age of 50 seems to give it as their email to companies like Target.

TowerTall · on Sept 7, 2023

I have never had this problem and I have been using firstname@lastname.com for 20+ years

pkaeding · on Sept 7, 2023

Yeah, I've had issues with firstname@lastname.name, but only with terrible regex validation logic that thinks a TLD can't be 4 characters long. And some quizzical replies from people: "dot name? Is that new?" Yeah, I say. Its pretty new.

Phrenzy · on Sept 7, 2023

Whatever, Tim Apple...

usefulcat · on Sept 7, 2023

Same here

yosito · on Sept 7, 2023

I use bestbuy@lastname.com with no issues. I often get customer service people who think I work for their company.

qingcharles · on Sept 7, 2023

Lots of sites don't let you put their business name in the username. For instance, Samsung won't let you register with "samsung" in it.

yosito · on Sept 7, 2023

But they will let you use sam_sung

happymellon · on Sept 7, 2023

When I first got a dot UK address a lot of forms refused to accept it, demanding that I use org.uk or co.uk instead.

It was really annoying, luckily it doesn't happen as much anymore.

tornato7 · on Sept 8, 2023

I tried using the clever email equivalent to me@firstna.me only using a more obscure TLD. Most people got very confused by this.

bhartzer · on Aug 22, 2023

I know websites (and some of my clients) who are actually actively REMOVING copies of their website from the Internet Archive.

There's a specific process for getting all of the content removed, and asking them to not archive the website.

For some ecommerce websites, they're removing copies from Internet Archive because there's pricing data that's getting archived.

I've also had clients remove copies because they've had a big problem with scrapers who are scraping the copies of the site in Internet Archive. They've been able to (mostly) stop scrapers on the site, but having archived copies of the site allows scrapers to scrape the Internet Archive.

hotnfresh · on Aug 22, 2023

Heh. What’s nuts is that, stripped of other copyright-bearing context, it’s my understanding that they’d not have a (legal, but policy, maybe) leg to stand on asking for pricing data to be removed. That’s supposed to be free (as in freedom) to anyone who finds a way to get ahold of it. Sucks that they’re killing the entire archive over something they have no actual claim to.

bhartzer · on Aug 22, 2023

I agree, now that you pointed it out, there probably isn't a legal leg to stand on. IANAL, though--would be interested in hearing from an attorney about that.

Regardless, if you can prove you're the website owner, then Internet Archive will remove all of your content and stop archiving your site if you ask them.

hotnfresh · on Aug 22, 2023

Right, by IA policy they can ask to have the site taken down. What’s unfortunate about it is that they’re doing it over data that they aren’t supposed to be able to restrict, and the rest of the site (which I doubt they much mind being on IA) is caught in the crossfire, as the means by which they’re getting the part they care about taken down. Like, they probably wouldn’t bother except that the archive happens to contain data that isn’t supposed to be restricable anyway but de facto is, if they take the whole archive of the site down.

bhartzer · on June 5, 2023

Just fyi, the database that is used for the site:domain.com is actually not the same database that they use for live searches.

So you may see a certain number of pages using the site: command but not or less pages may be indexed.

If you want pages indexed, out then in an xml sitemap file, make sure there are internal links to them on your site, and external links from other sites really helps. Third party indexer tools help as well.

bhartzer · on April 15, 2023

Starlink, $110 per month

100mb/s down and 20mb/s upload

Colorado mountains, 10,000 feet altitude.

bhartzer · on March 9, 2023

When copying or sharing a link, I usually manually strip out everything after the ? in the URL.

I've been doing this manually for several years now.

Is it possible to create a bookmarklet that does the same thing (i.e., via JavaSript)?

LukeLambert · on March 9, 2023

  javascript:prompt('URL',window.location.origin+window.location.pathname)

Edit: But be aware that some sites (like this one) need the parameters in the query string.

bobbylarrybobby · on March 9, 2023

I have an app I use to monitor my clipboard and rewrite certain URLs if sees, placing the clean version back in the clipboard (being sure not to end up in an infinite loop).

bhartzer · on March 9, 2023

It sound like they’re hosting at the same place (gandi) where they registered the domain. This is absolutely never recommended, for security purposes. If someone gains access to your hosting account you’ll likely lose all your domain names.

Transfer the domains over to a better more secure registrar. Then set up a server or vps and set up as many mailboxes or emails that you need.

toast0 · on March 9, 2023

I personally prefer keeping domain registrar separate from dns host separate from server host, and probably email host separate from the others, too, but on the other hand, you now have several different vendors that can ruin your day.

Using bundled services from your domain registrar is especially problematic though, because when you switch registrars, you usually lose those bundled services, even though you already paid for them. Often, there's similar services available at the new registrar, but there's a cost to switch, and it's much more difficult to switch because the service provisioning is often tied to the domain process; service at registrar B won't be online until the domain is moved, and service at registrar A may be turned off immediately after the domain is moved, so you have no way to make an orderly transition.

jacques_chester · on March 9, 2023

> you now have several different vendors that can ruin your day.

The point is that a single vendor for everything can ruin your life.

capableweb · on March 9, 2023

Infrastructure best practices have gone out the window, haven't you heard? Most people who use AWS/$cloud_service use it for everything, best practices be damned. Many new projects start their working thinking about how to scale, before making it simple and before having paying users.

Strom · on March 9, 2023

> thinking about how to scale

Having a good plan for scaling is absolutely a great move. Changing fundamental architecture later isn't easy. Implementing it all immediately however ..

capableweb · on March 10, 2023

Sure, I agree, some sort of plan is a good idea. What I've seen many times though is engineers building systems for supporting 100k daily users while the product hasn't even found market fit yet, wasting lots of time on building complicated distributed systems way too early.

jeffreyrogers · on March 9, 2023

Is there not some way to undue that? I'm sure it would be a hassle, but hacking someone's account and transferring their domain name is a crime, it also leaves a very obvious paper trail. Seems like the registrars involved would be willing to reverse transfers under such circumstances.

devmor · on March 9, 2023

And until that reversal is done, you've exposed all the users of your domains (including yourself) to security issues, potential data theft, and destroyed your own website's reputation.

bhartzer · on March 9, 2023

I've been running a stolen domain name recovery service for a few years now. Even though hacking into someone's account and transferring the domain name to themselves or transferring it to another domain registrar is a crime, it's never prosecuted (when they come to us the first thing we have them do is file a police report).

The problem is that most domain registrars won't help get your domain name back. Many domains are stolen because they hacked the email address and not the domain registrar account (even though that's how they got access). Most domain registrars don't care at all, and won't help. And there are no current ICANN policies for dealing with stolen domain names. Even UDRP is not set up for dealing with stolen domains. Although we were successful getting one back via UDRP since the business was using the domain previously and we ended up claiming 'commonlaw trademark'.

This is one reason why we've been so successful getting stolen domain names back for clients: we use some alternative methods, such as actually talking to people at the registrars involved and talking with the domain thief, to get domains back.

xoa · on March 9, 2023

>Most domain registrars don't care at all, and won't help.

Don't leave us hanging like that! Which are the registrars that do care, and will help then? Even if they cost more.

TylerE · on March 9, 2023

Seems like the definition of a pyrrhic victory.

justeleblanc · on March 9, 2023

Is there any indication that Gandi is insecure?

mbreese · on March 9, 2023

Not that I know of, it’s just good practice to separate services like this (even if it can be more expensive and logistically difficult).

justeleblanc · on March 9, 2023

I'm asking because the GGP wrote "better more secure registrar". Since I'm using Gandi right now (and I don't care about their mailbox offer), I wanted to know if it was really insecure.

judge2020 · on March 9, 2023

GGP said hosting, but I think they mean email hosting. Even if you keep your actual registrar account @gmail or anther third-party, it's not recommended to handle your registrar, DNS, and email in the same place, since a compromise of any of them is likely to lead to compromise of the other systems (eg. an attacker gains admin permissions on the website / backend and uses it to reset your email password and download your email inbox)

Tijdreiziger · on March 9, 2023

Aren't these orthogonal concerns?

registrar or DNS gets hacked -> attacker can receive mail as you (by transferring your domain or changing your MX record)

e-mail host gets hacked -> attacker can download your inbox

both -> both

justeleblanc · on March 9, 2023

I'm sorry but I don't understand what you're saying. The sentence was literally "Transfer the domains over to a better more secure registrar." This is about domain names and registrars and it's implying that Gandi is insecure. Your point about putting your eggs in the same basket is a different point.

bhartzer · on March 6, 2023

I was literally adding a 2FA authenticator app to my Twitter accounts that aren't Twitter Blue accounts when this happened. I thought it was something I did.

Good to know that I can rely on H/N to tell me what's really going on... and it's not "just me".