I work at Google in web search. It's great to see another example of a site indexed under HTTPS as it's a common question we see. Sadly, for such an important topic, there are many myths.
I didn't dig into this particular example, so a couple of things about the topic of this post:
1. As much as I'd personally would love to see it happen, we currently don't give any ranking boost or demotion based on whether the site is HTTPS or HTTP. Regardless, it's a good thing to do for your users anyway, so please do it if you can.
2. Fellow Googler Ilya Grigorik, who's also on HN, and myself gave a talk about HTTPS two weeks ago at I/O. We covered a lot of topics about how to deploy HTTPS in a secure and fast way, and also how to get Google's indexing algos favor the secure site. It's here:
I had this debate on my last project: Does Google consider http and https on the same host to be separate sites? Assuming http://www.example.com/home and https://www.example.com/home return the same content, are they considered one page or two?
You should watch the I/O video about what to do here for optimal indexing. Briefly, redirect all variants to just the one, but there is a lot more you need to watch out for. It's all in the video.
So we have two websites on the same server (same IP, using SNI), siteA has SSL, siteB doesn't. Google will attempt to crawl https://siteB/ (which will return siteA's content! in a browser you would get a warning about the certificate being wrong). Since there are no links anywhere that point to https://siteB why does google crawl it? This is technically a configuration / apache error, but google should not be crawling https websites when their domain does not match the certificate.
This means that at some point Googlebot discovered https://siteB. It could simply have been a misconfigured CMS, or a bad sitemap, or an errant link one of your visitors shared on a forum, or something a previous owner of the domain did, or anything really. You may think there are no links to the site, and that may be true right now, but it's about something that Googlebot found in the past.
The correct fix is, as you say, to make sure the server doesn't respond to invalid certificate+site combinations.
The scary thing is that I started writing an article about this on the 28th, all due to a comment I was sent by an advertising agency stating that "HTTPS is only really for checkout pages". I nicely pointed out why they were wrong.
It was mentioned somewhere that if Google started giving priority to full HTTPS sites, there would be a mass scramble to convert web sites to support HTTPS. Isn't this what we want?
You can warn people all you like, but until it starts to hurt them, they won't listen.
It looks like some people are getting carried away with the SEO implications of this.
We're not implying that your site will get ranked higher than other sites if you have HTTPS. What we're saying is that if your site has both HTTP and HTTPS versions of the same content that Google will now return an HTTPS link. The biggest implication is that if you support HTTPS most of traffic will now be using HTTPS rather than HTTP.
I just went through this with some of my sites. Some of my URLs were showing up in searches with HTTP URLs, while others were showing up with HTTPS URLs. Digging deeper also showed both HTTP and HTTPS being indexed.
This isn't an HTTPS preference on Google's part. It's (effectively) a mis-configuration on the web site operator's part.
HttpWatch (and many other folks with web sites) should either be using 301 Redirects or, in many cases where that's not compatible with other things being done, using the Canonical link element to indicate with https content is just a copy of http content.
So: Try setting your ‘canonical’ line properly in your web pages. Then this won’t happen.
Right now HttpWatch's http pages report their http as canonical. Their https pages report https as canonical. They effectively are listing their site twice with Google (and probably splitting/hurting SEO, but that's another story.)
Is google really favoring https sites? This could have unintended consequences like SEO guys suddenly demanding a public IP for their customer sites (many of which have no need for encryption I imagine) and causing a heavier IPv4 drain than expected. IPv6 just isn't here yet. Must be nice to be an SSL seller right now.
I find that if there is a https site, google will just send me there, which is nice, but if they're changing their search algorithm for https that kinda sucks. I want my search terms to match up with the best content, not used as a reward system for implementing SSL everywhere.
The article is pretty short on facts. I don't think its favoring anything. It just uses SSL if you have it. On the downside, I have noticed that connecting to sites has been slower than usual lately. The SSL handshake is still slow. Whatever happened to making all of this run faster?
Anyone else enjoy the irony of HN linking to the plain-text version of the httpwatch site?
That doesn't matter at all. XP doesn't support SNI, therefore every application which uses XP's SSPI libraries doesn't either. So IE6-8 and Safari on XP all don't support SNI.
Chrome on XP does support SNI but that is because they don't use XP's SSPI library for SSL connections (they use Mozilla's library NSS).
On a mixture of around 100 consumer facing UK sites (think small local businesses) that I have access to, Windows XP made up 3% of sessions in the last month, that's out of over 100k sessions according to Analytics.
Duckduckgo does something similar. They use the rulesets provided by HTTPS-Everywhere. Not sure how Google is doing it. Maybe if a site redirects to https, and also has HSTS configured, it would be pretty safe to return HTTPS links instead of HTTP ones as search results.
Yesterday I noticed that DDG was no longer taking me to the https version of youtube and I'm fairly sure it did before. It looks like Google search results don't take you to https youtube either.
We originally setup HTTPS on the Wordpress blog just to demonstrate how it could be done without mixed content warnings. However, we never setup an SSL CDN so all the content is served directly for HTTPS but uses a CDN for some content over HTTP.
Looks like this will have to change now that all traffic from Google searches is going to come in over HTTPS.
I think this gives a poor image to the brand. There is a "Custom SSL Options for Amazon CloudFront" [1]. (I am now renewing my own domain certificates!)
The CDN nodes don't have a separate ip for http only and for http+https, so if you try https, you're hitting a service that wasn't prepared for that. Same thing happens if you virtual host lots of http sites on a single IP with one https site: everything is fine as long as nobody tries to do https, but if they do try, they get the cert for the one site that is doing https.
The CDN's servers provide the encryption, so it would make sense that the certificate is in their name. You can't do the encryption on the origin server, because the CDN needs access to the data to be able to cache it.
I understand that point, I know you can't use the certificate on another domain by design. I'm just curious why you wouldn't issue a certificate to your CDN signed with your domain as well. It's something that bothers me about npr.org especially as it creates problems with their API for member stations wishing to go fully SSL.
But it completely breaks the certificate meaning. Imagine the bad guy giving you a fake certificate called: cdn.badguy.com and explaining that because the CDN does the encryption you can trust this domain...
I didn't dig into this particular example, so a couple of things about the topic of this post:
1. As much as I'd personally would love to see it happen, we currently don't give any ranking boost or demotion based on whether the site is HTTPS or HTTP. Regardless, it's a good thing to do for your users anyway, so please do it if you can.
2. Fellow Googler Ilya Grigorik, who's also on HN, and myself gave a talk about HTTPS two weeks ago at I/O. We covered a lot of topics about how to deploy HTTPS in a secure and fast way, and also how to get Google's indexing algos favor the secure site. It's here:
http://www.youtube.com/watch?v=cBhZ6S0PFCY
Happy to answer any questions here.