- "...make SSL the underlying transport protocol, for better security and compatibility with existing network infrastructure." Won't this break many caching models?
- "...provides an advanced feature, server-initiated streams. Server-initiated streams can be used to deliver content to the client without the client needing to ask for it." Nice, push without comet-style hacks.
EDIT: more items that stand out
- "SPDY implements request priorities: the client can request as many items as it wants from the server, and assign a priority to each request"
- "Clients are assumed to support Accept-Encoding: gzip. Clients that do not specify any body encodings receive gzip-encoded data from the server."
- "The 'host' header is ignored. The host:port portion of the HTTP URL is the definitive host." I guess this supplants the need for SNI support. Edit: No it doesn't since SPDY sits above SSL. The host isn't known until the secure link has already been established.
It's just a community forum, but things like ACTA and the erosion of privacy that our government is gradually executing all make me really question whether there are downsides.
I'd originally made it available over SSL to help people access it from work without the proxies blocking it (works very well btw), but now it'll be everything.
As part of this I'm doing a review of the styles, images, javascript, etc to ensure that I perform as few requests as possible due to the lack of caching.
The software I'm using hasn't really considered requests. The view seems to have been "it's a one time hit and then it's cached", whereas in the SSL world you'll be good if you get a cached once-per-session model.
It also complexifies things on the back-end, thinking of things like software load balancing and pass-through requests.
All good stuff though, I get to learn about how to make this stuff work and scale at the same time as providing a great feature (privacy and security) to my users. Traditional models are worth breaking to gain these benefits.
Given that IPv6 is a good thing(1) any technology change that brings forward the time line for IPv4 address exhaustion is a good thing.
It is human nature that people don't do things that sound like hard work until they have to, and the continuing use of IPv4 with NAT and other hacks falls firmly into that camp.
(1) Every device becomes addressable again - I remember when it was normal to assume that devices could be reached directly, and would be fire walled if required. That led to a much greater number of people running services from their machine. From the perspective of a startup the idea that a client can run data services without some horrible <nat-related> hack is really interesting.
These all sound like a step in the wrong direction to me:
The reason HTTP has been so successful has been it's simplicity. HTTP with it's human readable format is not the most efficient, but this makes this it simple. Things like header compression to me are missing the point.
SSL encryption will break all caching, and add latency in terms extra round-trip times.
gzip encoding has minimal value on media data types (images, movies) that represent the majority of the data.
This just sounds like a whole load of complexity for a relatively small one-off hit.
> SSL encryption will break all caching, and add latency in terms extra round-trip times.
It won't break all caching, just intermediate caches. True this does raise bandwidth usage and initial page load latency (assuming static content is not already cached on the client), but to say it breaks all caching is off.
That said, static image serving is not going to benefit from this new protocol all that much. It seems, like much of Google's other work in this area, to be focused on improving the performance of web-based applications.
> It is designed specifically for minimizing latency through features such as multiplexed streams, request prioritization and HTTP header compression.
They specifically say their goal is to minimize latency to make web based applications more responsive. They aren't interested in making you able to download movies faster, what they want to do is make the web more responsive so that they can develop more apps for the web that traditional would have required a desktop client.
> It won't break all caching, just intermediate caches.
As far as I know, Firefox and maybe other browsers don't store on disk content delivered via https by default, which means that final caches will also be affected. Sure, the default can be changed, but it's an extra security risk. If it wouldn't be, the default setting wouldn't be like this.
Impressive, how Google are taking on challenges that pretty much no other company could possibly address... replacing email... enhancing HTTP, etc...
They're the only company with the means to take on such massive, unprofitable projects, and at the same time enough street cred that everyone won't immediately assume they're trying to make the web proprietary. The fact that they tend to open up these things from the get go helps, too.
They're the only company with the means to take on such massive, unprofitable projects, and at the same time enough street cred that everyone won't immediately assume they're trying to make the web proprietary.
They are hardly the only company doing this. Microsoft Research do a ton of pretty visionary / far-reaching work, much of which has nothing to do with MS-proprietary platforms. For example, the VL2 architecture for data center networks: http://sns.cs.princeton.edu/2009/10/new-datacenter-networks/
Another example: Simon Peyton-Jones and Simon Marlow, which make up a huge chunk of power behind Haskell and specifically GHC, are Microsoft researchers. (Although Haskell and GHC are not Microsoft projects, they get paid by Microsoft to work on them.)
The Simons publish their work freely, and go to a lot of effort writing their papers well so that their work spreads beyond Microsoft. And as far as I know, their code is all open source.
Any idea if there are patents pending on any of this IP? I would see that if the patent was applied for before the paper is published, the paper would not be considered prior art and would not invalidate the claims.
As for licensing, Microsoft owns the said IP and can re-license it on a whim.
Not all of Microsoft is evil, but they, as a company, cannot be trusted to behave in a civilized manner.
I don't think there's really a big ethical difference between MS, and other big companies like IBM and HP. Each company does a lot of research that benefits the community at large; they also need to recoup that investment via patents, technology transfer and the like. Trusting IBM, for instance, any more or any less than Microsoft is a little naive.
I must disagree. Even if IBM has a chequered past, Microsoft has a very nasty recent history of anti-competitive and plain unethical behaviour under the current administration. I agree IBM cannot be trusted, but claiming someone should trust Microsoft because IBM is no better (a disputable allegation in itself) is a logical failure.
I am not a patent lawyer, but is there any evidence that Microsoft is even trying to protect this particular IP?
AFAIK, the Simons publish research papers freely, and GHC is licensed under what amounts to a BSD3 license (by the University of Glasgow... not Microsoft).
Microsoft are long past the day when they were evil. It's just that, unfairly or not, if they tried to do this people would be going through the license with a fine tooth comb in the certain expectation of traps.
I wonder if Microsoft will one day redeem themselves to the public like IBM did?
I don't regard the shameful approval of OOXML as an ISO standard, and all the questionable things Microsoft did in order to accomplish it, as a particularly nice thing to do. There are a lot of NBs that were formed specifically for that purpose that were never again heard from. As it is now, the damage to ISO seems irreversible.
Also, there is the trolling about "Linux infringes 4 bazillion of our patents" empty threats (because if they weren't empty they would have acted) and the failed attempt to sell some of those patents to trolls that would go after Linux users. The fact if failed (because a non-invited group won the auction and donated the patents) makes it not less evil.
No. Being evil is in their soul.
They may be less evil now - I can agree on that. But it's so much more because they are far less relevant now than they were in the 90s than any particular change of heart.
From my perspective, they are developing their web browser at the absolute minimum rate necessary to avoid a precipitous decline in its market share - not fast enough to avoid a decline, but not so slow that it would vanish quickly, and certainly not fast enough to add value to internet standards or web application performance. I'll only come around to your view when they stop dragging their feet in hopes that Silverlight hits an inflection point.
I think the important thing to note is that Google is the only large company whose business model depends almost completely on the success of the Internet.
This is why we see them at bat for Net Neutrality as well as efficiencies within communication protocols.
Microsoft, Apple, Oracle, HP, IBM . . . those companies make their livings in other ways.
The only company comparable to Google would be Yahoo in this way. Their inability to keep engineering talent means their contribution to The-Internet-As-Platform is not as significant.
While I agree with you generally. MS, Apple, Oracle, HP, IBM or any personal computer or server related business would either not be as successful or their product as widely used if there was no internet.
IE. Imagine a personal computer market with no way for computers to communicate with each other, would users be still buying computers as much as they do now or would it be just a small niche market?
Even though there are several big players whose business model depends quite a bit on the internet, yet no other tech companies does as much as to improve the internet as Google does. The icing on the cake is that Google opens up and shares most of these projects to everyone for free.
These other companies get more value by the internet existing and being useful. They don't get much value when people use the internet more. Google does. In a sense, google competes with other ways people spend their time, like television.
I love that Google is focusing on how to speed up the web, but they are missing the most obvious thing they could do that would have a dramatic impact on the speed of the web:
Update their PageRank algorithm to take into account website speed, and then publish this as a fact.
The idea would be that they would have GoogleBot measure the latency, and overall page load time and rank sites higher that respond faster, and slow sites lower.
One of the problems with PageRank is that most of how they measure site quality is a mystery, but making it plain that speed is used in ranking would bring website performance to the front of everyone's minds. They could use similar metrics as YSlow and PageSpeed use, and tell everyone to optimize their sites using those tools as a starting point.
This is a horrible idea. Yes, it would result in increased focus by some people on speed, but probably the same people who focus on "SEO". So now content relevancy has yet another adversary in the battle for SERPs. There's a LOT of really incredible, original content out there written by amazing people who do it in their spare time and don't give two shits about their SEO or the speed of their server, and now that content is going to slip further down the rankings because it takes 2 seconds to load instead of 100 ms? No thanks.
You are correct that SEO people would latch onto the idea and start producing faster sites. But so will many legitimate sites that want to provide a good user experience. Is that really so bad?
I think that site speed does correlate with a better user experience. A site that loads quickly shows that the person who made it cares about my experience more than a bloated site that takes forever to load.
I will admit that it's certainly not as as strong an indicator as the content, inbound links and other factors that are rumored to be part of PageRank but speed does matter. All other things being equal, I would much rather visit a site that loads in 2 seconds than one that loads in 20 seconds.
You better start using it, then. Be it an official part of pagerank, or some emergent property of their crawlers, I've seen very good evidence that page load speeds have strong influence on natural search rank.
Regardless, Google is doing this to push at the technical limitations on apps like GMail or Google Reader, not Joe Shmo's burger shop.
That was my thinking too. With PageRank Google is trying to find a way to measure the perceived quality of the user experience for each page. Except there's no way to do that directly, so they look at indirect methods, like counting the number of inbound links. This is just another indirect measurement to add to (what I'm sure is) hundreds of other factors they use to rank a site.
IMHO a site that cares about the user experience would have been optimized to load quickly.
Heck, even if Google told people speed was important, and didn't incorporate any speed measurements into PageRank, I think it would have a positive effect.
They already do that in a sense. Google's webmaster tools show you how quickly googlebot crawls your site, you can actually adjust it higher, but I'm sure their crawl recommendation is based upon how much the algorithm thinks your site can reasonably handle.
So, in theory, the faster your site responds, the more pages googlebot can crawl without crushing your servers, the more people find your site when searching, thus giving more people the opportunity to visit your site because it's fast.
Unless they want to make the web faster for their properties, which SPDY would do. They can only optimize their own sites up to a point and then they need to make the whole web infrastructure more efficient to realize any more speed gains, across all clients.
Except your network connection to a site is going to be drastically different than Google's. Case in point: My datacenter directly peers with Comcast, so I get <7ms pings from home. Google gets around 35ms.
It would be like comparing the drive to a particular location using completely different starting points.
Plus, people would start doing all sorts of tricks to boost their ranking. Serving static, cached files for Google and giving everyone else a slow, dynamic site. It's way too easy to abuse.
Google just needs to figure out a reasonable baseline latency to subtract - for that, there's ping, ping the previous host on the route, latency to other hosts with close IPs, GeoIP+a database of baseline latency to sites in the same area etc.
And the tricks will be dealt with the same way as now: If you're caught doing them, you're blacklisted. But this will be less of an issue in this case: If you've taken the trouble to set up caches that you can direct Google to -- why not direct your actual users to that too?
But you're assuming location is all that matters. Certain network providers, no matter how close they are to you, just have shitty, overloaded networks.
The cache trick is that Google doesn't need to see dynamic content as much as the user does. Example: Any site that's based on user selection to determine customization. I might need to see recommended content that's based on a very complex algorithm that takes a while to generate the page. Whereas Google just needs to see a page. Cache for Google, generate from scratch for everyone else.
Googles crawlers are located all over the place, and especially for the latency score, they should hit the site from a few different locations, as well as different times of day.
The cache trick is the same for not-logged-in users as well as for Google. If you're showing the same page to more than one user (and you should be, for your not-logged-in users), you should cache it and show everyone that. If show Google a few hours old version (that you wouldn't want users to see) you'll be punished with the current rules.
If you're only showing content to logged-in users, Google can't see anything, anyway.
Regarding the use of SSL by default, they better require the use of the SSL mode which sends the hostname before the crypto negotiation. Otherwise that's a lot of IP addresses required for virtual hosting!
Even so, the current certificate issuing process is lengthy and expensive. Requiring SSL would be a major hit to the mom-and-pop type websites that just want a simple webpage on their own domain for their business.
They'd now have to pony up for an SSL cert, which at least doubles the website ownership cost, as well as potentially go through a very complex process to get the certificate issued.
It also completely eliminates the possibility of them having a free website on their own domain, unless an easy and free method to obtain an SSL cert becomes available.
The gain to them in these circumstances is negligible, if they aren't doing any e-commerce.
One imagines djb envisioning a single roundtrip for encrypted HTTP, built on top of DNSCurve.
You have the public key via DNS, you send an encrypted request and receive a verifiable encrypted response that includes content and begins the multiplexed channel.
I'm impressed that they're able to get such good performance (and latency) while still running the traffic over SSL. For example, average page load time over SSL is 1899 ms ("SPDY basic single-domain / SSL"), compared to 3112 ms for plain HTTP -- they don't show HTTPS, but presumably that would be even slower.
Am I missing their explanation of how a SPDY-capable client and server discover and make use of SPDY instead of HTTP?
For this to take hold in the wild, it has to be possible for me to run a combined HTTP+SPDY server, and for a client connecting to it to automatically make use of SPDY if it is capable. This must happen without user intervention (i.e. we don't expect people to type spdy:// instead of http://).
It seems like this should be possible. Perhaps there could be a special server response that instantly "upgrades" a TCP session from HTTP into SPDY. But I'm not seeing anything about that in the docs; is this part of the SPDY plan (yet)?
You could browse to the page the regular way, then load a Javascript or Flash library which would use SPDY for all further communications. In your typical web app the user experience could be visibly improved.
You can already do this with Flash today, and it would be much easier for Google to push it into the DOM than to touch the transport protocol.
It doesn't - but one could imagine adding support for raw sockets in the DOM. It's easier to change things at the document level rather than at the discovery and transport level.
You could easily just have the server send an additional HTTP header for all it's responses. Old clients would ignore it, and new ones could detect it and start the SPDY session.
I've a bad feeling of this potentially making the web layer more complex. Also the encryption bit may not be a good idea if you are not Google...
In general I'm more happy with web standards. Google doing research on this is great, but if they unilaterally push solutions that will anyway hit the mass market because they are Google is not good.
All the web, the idea of HTTP APIs and so on are based on the fact that HTTP may not be perfect but it's trivial to use, implement, and so forth. Please don't break this fact.
I wouldn't be so afraid of Google being able to unilaterally effect change without consensus. Even getting adoption for their Chrome browser seems to be an extreme uphill battle.
A: No. SPDY replaces some parts of HTTP, but mostly augments it. At the highest level of the application layer, the request-response protocol remains the same. SPDY still uses HTTP methods, headers, and other semantics. But SPDY overrides other parts of the protocol, such as connection management and data transfer formats.
Interesting parts:
- "...make SSL the underlying transport protocol, for better security and compatibility with existing network infrastructure." Won't this break many caching models?
- "...provides an advanced feature, server-initiated streams. Server-initiated streams can be used to deliver content to the client without the client needing to ask for it." Nice, push without comet-style hacks.
EDIT: more items that stand out
- "SPDY implements request priorities: the client can request as many items as it wants from the server, and assign a priority to each request"
From the protocol document: http://dev.chromium.org/spdy/spdy-protocol
- "Content-length is not a valid header"
- "Clients are assumed to support Accept-Encoding: gzip. Clients that do not specify any body encodings receive gzip-encoded data from the server."
- "The 'host' header is ignored. The host:port portion of the HTTP URL is the definitive host." I guess this supplants the need for SNI support. Edit: No it doesn't since SPDY sits above SSL. The host isn't known until the secure link has already been established.