Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Google Has Given HTTPS a Huge Boost (httpwatch.com)
59 points by cleverjake on July 7, 2014 | hide | past | favorite | 41 comments


I work at Google in web search. It's great to see another example of a site indexed under HTTPS as it's a common question we see. Sadly, for such an important topic, there are many myths.

I didn't dig into this particular example, so a couple of things about the topic of this post:

1. As much as I'd personally would love to see it happen, we currently don't give any ranking boost or demotion based on whether the site is HTTPS or HTTP. Regardless, it's a good thing to do for your users anyway, so please do it if you can.

2. Fellow Googler Ilya Grigorik, who's also on HN, and myself gave a talk about HTTPS two weeks ago at I/O. We covered a lot of topics about how to deploy HTTPS in a secure and fast way, and also how to get Google's indexing algos favor the secure site. It's here:

http://www.youtube.com/watch?v=cBhZ6S0PFCY

Happy to answer any questions here.


I had this debate on my last project: Does Google consider http and https on the same host to be separate sites? Assuming http://www.example.com/home and https://www.example.com/home return the same content, are they considered one page or two?


Hi tootie

These are two sites. In reality, most secure sites have four variants:

http://www.example.com

http://example.com

https://www.example.com

https://example.com

You should watch the I/O video about what to do here for optimal indexing. Briefly, redirect all variants to just the one, but there is a lot more you need to watch out for. It's all in the video.


But why?


So we have two websites on the same server (same IP, using SNI), siteA has SSL, siteB doesn't. Google will attempt to crawl https://siteB/ (which will return siteA's content! in a browser you would get a warning about the certificate being wrong). Since there are no links anywhere that point to https://siteB why does google crawl it? This is technically a configuration / apache error, but google should not be crawling https websites when their domain does not match the certificate.


Hi return0

This means that at some point Googlebot discovered https://siteB. It could simply have been a misconfigured CMS, or a bad sitemap, or an errant link one of your visitors shared on a forum, or something a previous owner of the domain did, or anything really. You may think there are no links to the site, and that may be true right now, but it's about something that Googlebot found in the past.

The correct fix is, as you say, to make sure the server doesn't respond to invalid certificate+site combinations.


Are you in a position to request that these two videos are removed from the privileged access section of the GV Library? Definitely the top two videos I wanted to watch from there, and it seems like better security practices is a win for everybody. * http://www.gv.com/lib/security-for-startups * http://www.gv.com/lib/advanced-security-for-startups


The scary thing is that I started writing an article about this on the 28th, all due to a comment I was sent by an advertising agency stating that "HTTPS is only really for checkout pages". I nicely pointed out why they were wrong.

It was mentioned somewhere that if Google started giving priority to full HTTPS sites, there would be a mass scramble to convert web sites to support HTTPS. Isn't this what we want?

You can warn people all you like, but until it starts to hurt them, they won't listen.


google can easily help spread https more by offering:

  * free ssl cerf.
  * free/low cost https proxy / caching proxy for small websites.


especially for mobile app websites.


It looks like some people are getting carried away with the SEO implications of this.

We're not implying that your site will get ranked higher than other sites if you have HTTPS. What we're saying is that if your site has both HTTP and HTTPS versions of the same content that Google will now return an HTTPS link. The biggest implication is that if you support HTTPS most of traffic will now be using HTTPS rather than HTTP.


Am I the only one who finds it a little funny that the link in this post is to http and not https ? :)

https://blog.httpwatch.com/2014/07/07/google-has-given-https...


I just went through this with some of my sites. Some of my URLs were showing up in searches with HTTP URLs, while others were showing up with HTTPS URLs. Digging deeper also showed both HTTP and HTTPS being indexed.

This isn't an HTTPS preference on Google's part. It's (effectively) a mis-configuration on the web site operator's part.

HttpWatch (and many other folks with web sites) should either be using 301 Redirects or, in many cases where that's not compatible with other things being done, using the Canonical link element to indicate with https content is just a copy of http content.

So: Try setting your ‘canonical’ line properly in your web pages. Then this won’t happen.

Right now HttpWatch's http pages report their http as canonical. Their https pages report https as canonical. They effectively are listing their site twice with Google (and probably splitting/hurting SEO, but that's another story.)

https://support.google.com/webmasters/answer/139066?hl=en

http://en.wikipedia.org/wiki/Canonical_link_element

http://googlewebmastercentral.blogspot.com/2013/04/5-common-...

http://www.mattcutts.com/blog/canonical-link-tag/


Is google really favoring https sites? This could have unintended consequences like SEO guys suddenly demanding a public IP for their customer sites (many of which have no need for encryption I imagine) and causing a heavier IPv4 drain than expected. IPv6 just isn't here yet. Must be nice to be an SSL seller right now.

I find that if there is a https site, google will just send me there, which is nice, but if they're changing their search algorithm for https that kinda sucks. I want my search terms to match up with the best content, not used as a reward system for implementing SSL everywhere.

The article is pretty short on facts. I don't think its favoring anything. It just uses SSL if you have it. On the downside, I have noticed that connecting to sites has been slower than usual lately. The SSL handshake is still slow. Whatever happened to making all of this run faster?

Anyone else enjoy the irony of HN linking to the plain-text version of the httpwatch site?


I don't think they're changing the search algorithm, just linking to https rather than http if it's available for a particular site.

Regarding speeding up HTTPS, I run HTTPS on several sites and there isn't an issue with speed on either end.

This is probably what you are looking for though:

http://en.wikipedia.org/wiki/SPDY

http://www.chromium.org/spdy/spdy-whitepaper

http://www.chromium.org/spdy

For Apache: https://code.google.com/p/mod-spdy/


Why would they need a dedicated ip? Wouldn't they just need SNI?


WinXP marketshare is still around 25%. And the default browser on XP doesn't support SNI.


I'm guessing the default browser is IE7 (maybe even IE6?) for Windows XP.

The stats that matter are what browser they are using, which is less than 1% for IE7, and possibly near zero depending on your target market.

Edit: I see, it's a Windows XP issue rather than an IE issue.

http://serverfault.com/questions/389806/redirect-to-ssl-only...


That doesn't matter at all. XP doesn't support SNI, therefore every application which uses XP's SSPI libraries doesn't either. So IE6-8 and Safari on XP all don't support SNI.

Chrome on XP does support SNI but that is because they don't use XP's SSPI library for SSL connections (they use Mozilla's library NSS).


IE on XP does not support SNI, regardless of version (6, 7 or 8).


On a mixture of around 100 consumer facing UK sites (think small local businesses) that I have access to, Windows XP made up 3% of sessions in the last month, that's out of over 100k sessions according to Analytics.


Duckduckgo does something similar. They use the rulesets provided by HTTPS-Everywhere. Not sure how Google is doing it. Maybe if a site redirects to https, and also has HSTS configured, it would be pretty safe to return HTTPS links instead of HTTP ones as search results.


Yesterday I noticed that DDG was no longer taking me to the https version of youtube and I'm fairly sure it did before. It looks like Google search results don't take you to https youtube either.


Not sure if it's related, but there's something weird going on with YouTube only supporting RC4. https://productforums.google.com/forum/#!topic/youtube/hf7SD...


Is this mirrored anywhere? I'm having a hard time connecting to this site



Thanks!


Sorry, for the poor response of the blog. We still had a connection monitoring tool running that got swamped by the hacker news link.

The link should work fine now.


Quick question. Why don't you redirect to https for the blog?


We originally setup HTTPS on the Wordpress blog just to demonstrate how it could be done without mixed content warnings. However, we never setup an SSL CDN so all the content is served directly for HTTPS but uses a CDN for some content over HTTP.

Looks like this will have to change now that all traffic from Google searches is going to come in over HTTPS.


On a separate note: the www.ycombinator.com certificate is issued to cloudfront...


That's a pretty common thing that happens with CDNs. https://npr.org/ has a certificate for Akamai, for example.


I think this gives a poor image to the brand. There is a "Custom SSL Options for Amazon CloudFront" [1]. (I am now renewing my own domain certificates!)

[1] http://aws.amazon.com/cloudfront/custom-ssl-domains/


Can you go into why that is? I've never fully understood it myself.


The CDN nodes don't have a separate ip for http only and for http+https, so if you try https, you're hitting a service that wasn't prepared for that. Same thing happens if you virtual host lots of http sites on a single IP with one https site: everything is fine as long as nobody tries to do https, but if they do try, they get the cert for the one site that is doing https.


The CDN's servers provide the encryption, so it would make sense that the certificate is in their name. You can't do the encryption on the origin server, because the CDN needs access to the data to be able to cache it.


I understand that point, I know you can't use the certificate on another domain by design. I'm just curious why you wouldn't issue a certificate to your CDN signed with your domain as well. It's something that bothers me about npr.org especially as it creates problems with their API for member stations wishing to go fully SSL.


But it completely breaks the certificate meaning. Imagine the bad guy giving you a fake certificate called: cdn.badguy.com and explaining that because the CDN does the encryption you can trust this domain...


[deleted]


I don't think so. For example, look at this article: https://timnash.co.uk/building-cdn-ssl-cloudfront-certificat... they upload their own certificate to the CDN.


Ironically the link to this article is over http not https


should I buy the SSL from cloudflare? Are there any other places that lets you put SSL on your website?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: