I would not be surprised if google still has the data. Not sure how google handles things internally. However, google needs to pull up the results fast. So they might have 4 billion results with the word "water" in it. They only make tiny portion of that available. So if I type the words "Hot water" google it looks at the subset of pages with words "Hot" and the word "Water" So google must pull the pages that have both words quickly. So the number pages in these subsets "Water" and "Hot" must be small enough to quickly be merged/intersected. There are other things that could be done to speed it up, but I think you get the main idea.
However, what I am getting at with that simple example is for the searches to be quick google keeps these lists small. So there is a limited space due to time-constraints. So google must decide what is relevant for the available portion of their index.
However, that does not explain why other search engines don't have trouble with older sites/links. I suspect it's more of business decision than a technical one.
Intersections are another thing that Google search doesn't do properly anymore. If I search for something like lkasdfjer samsung galaxy s8 it just gives me matches for samsung galaxy s8 and ignores the first word. When I do searches like this, I do it for a reason and don't want matches that lack some of the search terms.
Not even this is sufficient any more. They now have a "verbatim" search, but I think even then some terms can be ignored -- terms which are not conventional "stopwords" like the.
I don't see that on the LHS. It would be nice if there's a link to it, something like https://www.google.com.au/search?verbatim=true that I could bookmark. Edit: or somehow set as the browser's default search engine.
Maybe "verbatim" is the same as putting quotes around every word. The verbatim search seems a bit tedious to activate: search for something, press "Tools", then "All results" and pick "Verbatim" in the drop-down. Although once activated, it stays activated for subsequent searches.
That helps. Even if you go to the advanced search, https://www.google.com/advanced_search, and use the box "all these words", you need to quote the words for it to take you seriously. I didn't give a good example, since there are no matches for that search (except reflections back to this discussion), but in other cases there are legitimate results that only appear after pages of invalid results.
I think that's a good result, yea? If so, I'm glad it helped! Typically I use the double quotes to search for specific code/error strings and then use non-quoted words in the query to help filter the results to specific context, like the app name or topic. Others suggested the "Verbatim" option.
It's probably architected to err on the side of giving you something over giving you nothing, because the common use-case for <not-in-index>+hit+hit+hit is "not-in-index term is a typo," not "not-in-index term is an intentionally-crafted attempt to zero the results."
Seems reasonable when you put that way, but as you said you are unsure how google handles their indexes. I doubt we will ever will unless your signature is on a NDA.
Google does not need to pull up ALL results fast. It only needs to return 10 results quickly.
That's not relevant to the article, which says that the results are not available AT ALL. (Although as of my posting the two articles seem to be available again.)
True, but google is not searching there entire index for those. A simple linear search takes N time. So for a word that occurs billions of times. Google is not going to go through that entire list. They might use some clever hashing to jump around, and sorting. However, when trying to intersect two keywords they either have to pre-generate the intersection or make the data set they are intersecting small enough to get those 10 results quickly.
However, what I am getting at with that simple example is for the searches to be quick google keeps these lists small. So there is a limited space due to time-constraints. So google must decide what is relevant for the available portion of their index.
However, that does not explain why other search engines don't have trouble with older sites/links. I suspect it's more of business decision than a technical one.