One of the interesting take-aways from the story is to that what your users "say" they want and what they actually do are two different things. I wonder if this further proves the diminishing value of old-fashioned surveys in the context of web usability. Beyond the very interesting eye-tracking tools, being able to run live A/B tests seems very effective -- especially if users never even know they are being surveyed.
Yeah, I think this approach makes a lot of sense (and it is another instance of "more data beats better algorithms", in a sense). It's also similar to some of the themes Paul Buchheit talked about his StartupSchool talk -- you want to listen to your users, but don't just blindly do what they ask for.
An interesting comment, particularly wrt the "natural language" search engines trying to out-algorithm Google:
"The learning curve on search is really fast," she said. "People go from 'Where can I get spaghetti and meatballs in Silicon Valley' to 'italian food san jose' really fast," she said.
Why is it hard to believe? You are searching billions of web pages. A typical index containing millions of web pages takes up gigabytes of space and fits in RAM on a commodity server. In order to run one search really fast over billions of pages, you need an instance of a distributed index that is three orders of magnitude larger than what fits into one machine.
Google found that when the results increased to 30 per page ... it took about twice as long to display the longer results list for the user.
I'm surprised that increasing from 10 to 30 results increased total latency by ~100%. I would have expected that other factors (one-time costs like session establishment) would dominate the overall latency, and the marginal cost of fetching 20 additional query results would be fairly small.
The clue is probably in the 700-1000 machine interaction. If they have to display 200% more results (10 vs 30), they're probably interacting with a lot more machines in their cloud.
Perhaps -- but even if that is true, interacting with more machines is presumably a trivially parallelizable operation, so it shouldn't double the overall response latency.