That 'policy' is still actually in effect, I believe, in Google's webmaster guidelines. They just don't enforce it.
Years ago (early 2000s) Google used to mostly crawl using Google-owned IPs, but they'd occasionally use Comcast or some other ISPs (partners) to crawl. If you were IP cloaking, you'd have to look out for those pesky non-Google IPs. I know, as I used to play that IP cloaking game back in the early 2000s, mostly using scripts from a service called "IP Delivery".
Is it even well defined? On the one hand, there’s “cloaking,” which is forbidden. On the other hand, there’s “gating,” which is allowed, and seems to frequency consist of showing all manner of spammy stuff and requests for personal information in lieu of the indexed content. Are these really clearly different?
And then there’s whatever Pinterest does, which seems awfully like cloaking or bait-and-switch or something: you get a high ranked image search result, you click it, and the page you see is in no way relevant to the search or related to the image thumbnail you clicked.
For context, my team wrote scripts to automate catching spam at scale.
Long story short, there are non spam-related reasons why one would want to have their website show different content to their users and to a bot. Say, adult content in countries where adult content is illegal. Or political views, in a similar context.
For this reason, most automated actions aren't built upon a single potential spam signal. I don't want to give too much detail, but here's a totally fictitious example for you:
* Having a website associated with keywords like "cheap" or "flash sale" isn't bad per say. But that might be seen as a first red flag
* Now having those aforementioned keywords, plus "Cartier" or "Vuitton" would be another red flag
* Add to this the fact that we see that this website changed owners recently, and used to SERP for different keywords, and that's another flag
=> 3 red flags, that's enough for some automation rule to me.
Again, this is a totally fictitious example, and in reality things are much more complex than this (plus I don't even think I understood or was exposed to all the ins and outs of spam detection while working there).
But cloaking on its own is kind of a risky space, as you'd get way too many false positives.
Do you have any example searches for the Pinterest results you're describing? I feel like I know what you're talking about but wondering what searches return this.
As the founder of SEO4Ajax, I can assure you that this is far from the case. Googlebot, for example, still has great difficulty indexing dynamically generated JavaScript content on the client side.
Did they really close the 'loophole' though? Jake said on Twitter they were actually hit with a manual penalty. So doesn't seem like it's patched. Seems like if he didn't boast about it they'd still be doing well.
I’m actually curious, how did Google do that? The guy who did it did it in a very obvious way, but I’m assuming you can just schedule a lot of posts that would drop once a day, make the AI to use different language structures and change the underlying AI model in general (e.g. switch between OpenAI, Mistral and whatever) and slow drip submit the posts. How would Google know they’re “mass generated”?
The original poster of that Tweet (Jake) admitted they got a manual penalty. Also, clearly Google didn't fix it because if not Google wouldn't be 'overwhelmed' with this current spam attack going on. If you look into the attack it's mass generated absolute spam garbage pages on hundreds if not thousands of separate domains. So it is definitely not fixed.
I get what you're saying, it's weird to interview him as a Google employee. But actually it would be really weird if they didn't include Danny Sullivan in the article in some capacity. Danny Sullivan, over the years, has been so influential and such an influential voice when it comes to Search and SEO. He previously was on the other side, not working for Google.
I own the domain of my last name. Several family members use (firstname@lastname.com).
I once went to get a new phone at Best Buy, and the employee needed my email address. I gave it to here (firstname@lastname.com) and she insisted that it was NOT my email address. She insisted that it MUST end in @gmail.com or @yahoo.com, something like that.
We frequently sign up for stuff online, and when we enter our email address it won't let us sign up... we figured it is because the email address is too similar to our actual name, the name we've entered in the 'first name' and 'last name' fields (it happens to both me and my wife at least 2-3 times a year).
I have the same, firstname@lastname.com/uk/.co.uk/etc; my family name alone is an absolute pain in the arse for most British English speakers to spell when given it verbally; to make matters worse, when I give people my email, over the phone for example, I get the combination of "what's it @?" and then when they finally get there, that my last name is after the @, another 5 minutes to get them to spell it correctly; Some, despite this dance still end up never getting it right.
My wife constantly (half-jokingly) reminds me of how much of an PITA I've caused her with my name (that she took), when her maiden name was so sophisticated and easy compared to my weird, unidentifiable, "foreign" (I'm British/English) one.
EDIT to add: I don't often have issues with forms, but I reserve that particular address for "important" family related things, the sort of account where I _know_ if I receive an email to it, I need to read it. Everything else I use a gmail for (as does my wife).
Whereas I made the opposite mistake of having firstname@outlook.com
I get ungodly amounts of spam, relentlessly, from everyone. Because anyone over the age of 50 seems to give it as their email to companies like Target.
Yeah, I've had issues with firstname@lastname.name, but only with terrible regex validation logic that thinks a TLD can't be 4 characters long. And some quizzical replies from people: "dot name? Is that new?" Yeah, I say. Its pretty new.
I know websites (and some of my clients) who are actually actively REMOVING copies of their website from the Internet Archive.
There's a specific process for getting all of the content removed, and asking them to not archive the website.
For some ecommerce websites, they're removing copies from Internet Archive because there's pricing data that's getting archived.
I've also had clients remove copies because they've had a big problem with scrapers who are scraping the copies of the site in Internet Archive. They've been able to (mostly) stop scrapers on the site, but having archived copies of the site allows scrapers to scrape the Internet Archive.
Heh. What’s nuts is that, stripped of other copyright-bearing context, it’s my understanding that they’d not have a (legal, but policy, maybe) leg to stand on asking for pricing data to be removed. That’s supposed to be free (as in freedom) to anyone who finds a way to get ahold of it. Sucks that they’re killing the entire archive over something they have no actual claim to.
I agree, now that you pointed it out, there probably isn't a legal leg to stand on. IANAL, though--would be interested in hearing from an attorney about that.
Regardless, if you can prove you're the website owner, then Internet Archive will remove all of your content and stop archiving your site if you ask them.
Right, by IA policy they can ask to have the site taken down. What’s unfortunate about it is that they’re doing it over data that they aren’t supposed to be able to restrict, and the rest of the site (which I doubt they much mind being on IA) is caught in the crossfire, as the means by which they’re getting the part they care about taken down. Like, they probably wouldn’t bother except that the archive happens to contain data that isn’t supposed to be restricable anyway but de facto is, if they take the whole archive of the site down.
Just fyi, the database that is used for the site:domain.com is actually not the same database that they use for live searches.
So you may see a certain number of pages using the site: command but not or less pages may be indexed.
If you want pages indexed, out then in an xml sitemap file, make sure there are internal links to them on your site, and external links from other sites really helps. Third party indexer tools help as well.
I have an app I use to monitor my clipboard and rewrite certain URLs if sees, placing the clean version back in the clipboard (being sure not to end up in an infinite loop).
It sound like they’re hosting at the same place (gandi) where they registered the domain. This is absolutely never recommended, for security purposes. If someone gains access to your hosting account you’ll likely lose all your domain names.
Transfer the domains over to a better more secure registrar. Then set up a server or vps and set up as many mailboxes or emails that you need.
I personally prefer keeping domain registrar separate from dns host separate from server host, and probably email host separate from the others, too, but on the other hand, you now have several different vendors that can ruin your day.
Using bundled services from your domain registrar is especially problematic though, because when you switch registrars, you usually lose those bundled services, even though you already paid for them. Often, there's similar services available at the new registrar, but there's a cost to switch, and it's much more difficult to switch because the service provisioning is often tied to the domain process; service at registrar B won't be online until the domain is moved, and service at registrar A may be turned off immediately after the domain is moved, so you have no way to make an orderly transition.
Infrastructure best practices have gone out the window, haven't you heard? Most people who use AWS/$cloud_service use it for everything, best practices be damned. Many new projects start their working thinking about how to scale, before making it simple and before having paying users.
Having a good plan for scaling is absolutely a great move. Changing fundamental architecture later isn't easy. Implementing it all immediately however ..
Sure, I agree, some sort of plan is a good idea. What I've seen many times though is engineers building systems for supporting 100k daily users while the product hasn't even found market fit yet, wasting lots of time on building complicated distributed systems way too early.
Is there not some way to undue that? I'm sure it would be a hassle, but hacking someone's account and transferring their domain name is a crime, it also leaves a very obvious paper trail. Seems like the registrars involved would be willing to reverse transfers under such circumstances.
And until that reversal is done, you've exposed all the users of your domains (including yourself) to security issues, potential data theft, and destroyed your own website's reputation.
I've been running a stolen domain name recovery service for a few years now. Even though hacking into someone's account and transferring the domain name to themselves or transferring it to another domain registrar is a crime, it's never prosecuted (when they come to us the first thing we have them do is file a police report).
The problem is that most domain registrars won't help get your domain name back. Many domains are stolen because they hacked the email address and not the domain registrar account (even though that's how they got access). Most domain registrars don't care at all, and won't help. And there are no current ICANN policies for dealing with stolen domain names. Even UDRP is not set up for dealing with stolen domains. Although we were successful getting one back via UDRP since the business was using the domain previously and we ended up claiming 'commonlaw trademark'.
This is one reason why we've been so successful getting stolen domain names back for clients: we use some alternative methods, such as actually talking to people at the registrars involved and talking with the domain thief, to get domains back.
I'm asking because the GGP wrote "better more secure registrar". Since I'm using Gandi right now (and I don't care about their mailbox offer), I wanted to know if it was really insecure.
GGP said hosting, but I think they mean email hosting. Even if you keep your actual registrar account @gmail or anther third-party, it's not recommended to handle your registrar, DNS, and email in the same place, since a compromise of any of them is likely to lead to compromise of the other systems (eg. an attacker gains admin permissions on the website / backend and uses it to reset your email password and download your email inbox)
I'm sorry but I don't understand what you're saying. The sentence was literally "Transfer the domains over to a better more secure registrar." This is about domain names and registrars and it's implying that Gandi is insecure. Your point about putting your eggs in the same basket is a different point.
I was literally adding a 2FA authenticator app to my Twitter accounts that aren't Twitter Blue accounts when this happened. I thought it was something I did.
Good to know that I can rely on H/N to tell me what's really going on... and it's not "just me".
Years ago (early 2000s) Google used to mostly crawl using Google-owned IPs, but they'd occasionally use Comcast or some other ISPs (partners) to crawl. If you were IP cloaking, you'd have to look out for those pesky non-Google IPs. I know, as I used to play that IP cloaking game back in the early 2000s, mostly using scripts from a service called "IP Delivery".