Hacker News new | past | comments | ask | show | jobs | submit login
Google Memory Loss (tbray.org)
1377 points by AndrewDucker on Jan 15, 2018 | hide | past | favorite | 539 comments



I've noticed this many times too, particularly recently, and I call it "Google Alzheimer's" --- what was once a very powerful search engine that could give you thousands (yes, I've tried exhausting its result pages many times, and used to have much success finding the perfect site many dozens of pages deep in the results) of pages containing nothing but the exact words and phrase you search for has seemingly degraded into an approximation of a search engine that has knowledge of only very superficial information, will try to rewrite your queries and omit words (including the very word that makes all the difference --- I didn't put it in the search query for nothing!), and in general is becoming increasingly useless for finding the sort of detailed, specific information that search engines were once ideal for.

To add insult to injury, if you do try to make complex and slightly varying queries and exhaust its result pages in an effort to find something you know exists, very often it will think you're a robot and present you with a CAPTCHA, or just ban you completely (solving the CAPTCHA just gives you another, and no matter how many you solve it keeps refusing to search; but they probably benefit from all the AI help you just gave them, what bastards...) for a few hours.

Google had the biggest most comprehensive index for many years, which is why it was my sole search engine. Now I'm often finding better results with Bing, DuckDuckGo, Yahoo, and even Yandex, but part of me is very worried that large and extremely valuable parts of the Web are, despite still being accessible, simply "falling off the radar".


> has seemingly degraded into an approximation of a search engine that has knowledge of only very superficial information, will try to rewrite your queries and omit words (including the very word that makes all the difference...)

I think the biggest irony is that the web allows for more adoption of long-tail movements than ever before, and Google has gotten significantly worse at turning these up. I assume this has something to do with the fact that information from the long tail is substantially less searched for than stuff within the normal bounds.

This is a nightmare if you have any hobbies that share a common phrase with a vastly more popular hobby, and is especially common when it comes to tech-related activities. I use Linux at home, and I program VBA at work. At home Linux is crossed out of most of the first few pages, and I just get a ton of results about Windows, and at work VBA is crossed off and I get results about VB6 and .NET.

Completely. Useless.

I can only imagine this has something to do with their increasing reliance on AI, and the fact that the AI is probably incentivized to give a correct response to as many people'above the fold' as is possible. If 95% of people are served by dropping the specifically-chosen search term, then the AI probably thinks it's doing a great job.

It seems like the web is being optimized for casual users, and using the internet is no longer as skill you can improve to create a path towards a more meaningful web experience.


This same AI effect can be seen in the Android keyboard, where _properly_ spelled words will be replaced after typing another word or two because it's been determined to be more likely what you want. It's infuriating.


It actually does this with such consistency that I think it's a very specific mistake. For example, I frequently swipe "See you soon." It always, always renders as "See you son", which I then have to manually retype. Sometimes twice. I don't have a son, and I'm not a blind old cowboy who jocularly refers to any random person as "son". I honestly just want to type "soon", for the love of... anyhow, this is an ongoing, totally inane battle of wills with my phone.

I think what's happening here is that there's a very impressive and sophisticated heuristic for predicting the probability of what you want to type by looking at the frequency of what you have typed in similar contexts. It uses its state-of-the-art AI to evaluate the context and build an array of candidate words, along with their respective probabilities. I suspect it is very accurate as it does this. Then it sorts the array by probability and pops the top element into the predictive text input.

Alas -- per my pet theory anyways -- it sorts like this:

  candidateWords.sort((a,b) => a.probability - b.probability);
Rather than:

  candidateWords.sort((a,b) => b.probability - a.probability);
...which is how a two-character diff can turn a brilliantly helpful AI into an ultimately annoying damnit-I-need-to-smash-something antagonist.


When a system tries to do something automagically and makes a mistake, it is very frustrating, especially because, to allow seamless large changes, hide competitive details, or make the UI more "streamlined", such systems rarely give users options to tune the results. A system that gives controls to the user and expects them to tweak their own experience is so much better in my opinion, except in the metrics of first-time usage (or first-time-since-major-change), when those controls look like information overload and make the system seem like something that must be learned before it can be used.

And yet, when the latter inevitably breaks on an edge case, users can try to fix it themselves. They don't hit a wall of frustrated "I can't do anything", they hit a challenge that they are empowered to try overcoming. They already know what they want and can set things that way, rather than trying vaguely to teach a system (machine learning, hardcoded heuristics engine, department of humans making seemingly unconnected changes to a GUI with each passing version and no obvious plan) to understand their desires.

I miss the days when users were seen as intelligent professionals who are willing to change settings, create and re-dock an assortment of toolbars to every edge of every screen/window to suit their daily tasks, read a manual (or at least search the integrated help entries) to overcome problems. Rather than "busy" phone users who just want to complete a task with minimal time spent learning and get back to posting on facebook or whatever, and who accept the automagical solution because adequate results instantly are somehow considered better than great results with some work.

Ugh, that whole block of text just kept growing; I had better leave and go ramble/rant at trees or clouds or something elsewhere.


There was a time when autocorrect on phones was moderately useful. That time has past. One of the first things I do with a new phone now is turn off autocorrect completely. It doesn't bother me to manually correct my own mistakes, but it bugs me a lot to have to correct the mistakes of the freaking robot that's supposed to be saving me time.


> I frequently swipe "See you soon." It always, always renders as "See you son"

This works fine for me with GBoard. Are you drawing a little circle on the o to indicate you want the double letter?


Wow, I... didn't know that gesture was a thing. Thank you! It seems to help a bit! I'm now getting a 50/50 son/soon ratio. That said: when I manually type "soon" -- definitely with a double "o" -- it still autocorrects to "son" on the first try. So, going to keep my pet theory intact.

The weird thing is that if I type nothing at all, the contextual predicted "next word" on GBoard is actually very good -- I wasn't praising it for comedic effect. But it really does seem like there's a sign error in a sort function which kicks in after you start typing.


If this comment is what teaches me I've been expected to do that, I'm going to throw my toys out of the pram.

I've specifically googled for instructions on how I might be expected to use the swipe-style keyboards, and turned up nothing.


As far as I know it's in the tutorial they insist you do upon first enabling swiping

edit: Don't actually see a tutorial in the app. Maybe I'm confused with another app such as Swype, but the same technique seems to apply to all.


I'm pretty sure Swype did explain it, and I know SwiftKey has this gesture as well (though I don't remember if it was ever explained to the user or just assumed they'd remember it from Swype).


The default keyboard on my Android didn't have any such tutorial. I did look in the app and online.


Swype taught me, it has helpful tips and I still use their keyboard to this day


Swipe style keyboard? How does that effect spelling ? So confused - I get how it changes to "predict," anticipate or whatever the AI engineers say but what I don't understand is changing a real word to a non-word. That's not intelligence. That's something else and I don't know what they get from changing word to a misspelled and nonexistent word other than eventually driving the human race insane: I'm going to be a Luddite;


Er, "son" and "soon" are both real dictionary words. Swipe-style keyboards use an internal dictionary to find the most likely match to the swiped pattern, and have methods for adding new words (mine for example automatically adds any tapped-out words after you hit space).


Very likely, every time you type "soon" and it corrects to "son", the probability of the change gets increased. So, the more it errs, the more wrong it will be in the future.

About it only replacing after you start writing, the probability of "son" must grow faster than the one of "soon" after you type "so". If there were such a huge bug, you would be seeing a new word every time, not always the same one.


Try switching between languages. I work in English, my wife is Italian, my colleagues are french speaking, I am Spanish and have Catalan friends, and I live in Germany. I gave up on predictive keyboards long ago.


You don't even need foreign languages to run into this problem. English is the only language I speak fluently, but I'm Australian. I'm also a programmer, and I have some american friends. So, depending on context sometimes I spell 'color' (programming or talking to Americans), and sometimes I spell 'colour' (talking to Australian friends and family). Same with behaviour / behavior, favour / favor, etc. The context for which spelling I decide to use is complicated. In the same document I might name the `getColor` function, but describe it as getting the colour of a pixel. I might have two chat windows open side-by-side with different people and in each window spell the same words differently.

All my devices insist on shaming me (or autocorrecting me) for one of those spellings. At this point it feels like a complete gamble which I'll get corrected on. I'm just slowly getting used to correcting the autocorrect. :(


When you said this it reminded me that css allows colors to be either ‘gray’ or ‘grey’. Which I’m glad because then I don’t have to fumble until I picked the right one. Though, I’ve learned to type hex colors (especially grey scale colors) intuitively now, so I usually use those a lot more than typed grays now.


SwiftKey seems to manage three languages (Finnish, Swedish, English) simultaneously pretty well. No need to explicitly switch language either, it figures out the current language as you type.


Yes, SwiftKey works almost perfect for me too in three languages scenario (Bosnian, English, Dutch).

The only thing it is confused about is the letter "i", which means "and" in Bosnian, and SwiftKey often capitalizes it where it shouldn't. Probably happens because I often mix Bosnian and English in the same message (instead of translating technical terms).


I have 3 languages in my GBoard. I've just arrived from a Spanish speaking country, and now when I try to write in Portuguese it still completes with spanish words. Sure they are are very similar languages.


Ditto on the SwiftKey recommendation: I have German, English and French always at my disposal.


I'm on the same boat, I'm Italian and speak Spanish a lot with friends, English is my daily language. The latest Google keyboard has helped but it's not nearly as good as T9 was.


It's all the keyboards. Swype used to work great, but now when it actually gets words right, if I have sentence with a second word that could possibly have been two similar words on the path it will just straight up replace both rendering the sentence completely incomprehensible. Who are these people that don't correct their sentences until the second incorrect word?


I second that. 5 years ago I could swipe a whole message blindly with no errors. Now I have to correct every second word.

I'd love a feature to disable all that deep learning and AI and just use the algorithm they originally had (proximity of where you typed to words in the dictionary). That worked so much better.


I'm glad it's not just me that's seen Swype getting progressively worse! Either you have to really emphasize what letters you want, or you give up and type it out. I'm about finished with it.


Wow. Thanks. I thought it was just me!

I had a Galaxy S3 and was a heavy user of Swype. My friends marveled at how fast I could type with it. It was perfect! I recently changed my phone to an S8 and Swype became unusable. It gets almost every second word wrong, so much that I'm thinking of disabling it entirely :(


Speaking of virtual keyboards I always liked (and still use) the one from Blackberry and I never had such problems (you should be able to install Blackberry keyboard on any Android phone). I switched to Google Pixel from iPhone once I found out I can use Blackberry keyboard there.


Oh man, Android does that too? This awful, terrible, no good fake "AI" behavior on iOS was one of the many quality issues with modern iOS that was making Android look more attractive.

iOS keyboards have been getting worse with every update, probably as more and more engineers feel the need to make a mark, or are required to fix bugs, and they spoil it. The simple and predictable statistical model of the original iOS was better than what we have now. So much of iOS was better back then, IMHO.


On BlackBerry if it autocorrected to something you didn’t want, one press of delete would revert to your original word. iOS makes you a) rekey the whole thing and b) will probably try to change it again. I can only assume that no one (or moons, it literally just tried) who works on this uses it themselves!


I always thought Blackberry had the best autocorrect experience. My old Blackberry would occasionally surprise me by catching something I wouldn't think it would know about, and after some initial tuning never ever frustrated me. It only enhanced my experience.


I’m on an iPhone now but in many ways it’s a step backwards from my old BlackBerry Pearl, let alone the Bold I got next


Call me a crazy person, lately I've been thinking long and hard about going back to the BB (BB Bold 9930) and am willing to make all the sacrifices that come with it for a lot of the frustrations with smartphone OSes listed in this thread. Maybe I'm getting older and my demands on what I expect from a phone are beginning to normalize and simplify: Text, Calls, Email, a browser for sports scores and reading news articles on the train.

Honestly it's very probable that I'll only ever keep a smart phone around as a music/podcast/audiobook device.


I never used a BB, but I did have a bunch of "dumb" cell phones, and I miss them. Making calls on them was easy: Flip open, press Talk, dial, Success! Now it's: Turn the phone on (which takes approximately 60 seconds on my Samsung because it can't see keystrokes until it finishes trying to connect to wifi), hit Home button, wait 3 seconds for that to work, hit Phone icon, wait 2 seconds, hit keyboard button, wait 1 second, dial phone waiting 500 ms between each key, press Call icon, wait 20 seconds for call to go through, hit speakerphone button (because as often as not the cheek sensor fails to work and my cheek disconnects the call), Success!

As the saying goes, smart phones are just pocket computers with shitty phones attached.


I think you need a better phone. I have none of these problems on my Samsung Galaxy Note 4.


This is a Galaxy S6. I'd probably get better performance if I didn't keep it in low-power mode, but I have no choice because the quest for thinner phones means the battery only lasts 2 hours in regular-power mode.


I am actually looking at prices now. The last model of Pearl can be had “new” for £85... only question is the battery...


Wish I knew what bb 9930 was but I am considering becoming a Luddite



On iOS, one press of delete and what you originally typed will show as a suggestion.


That’s only when you actually catch it mid stream.

Once you hit space and start a new word it’s game over.

Frequently happens at the worst moment when you’re trying to mash out a complex explanation in a rushed fury.


That's not true for this situation. For the "deep" replacements more than a word back that we're talking about, and also other changes that happen without any user notification via blue text popups, there is no easy way to fix without the long process of moving the cursor.


For a replacement? Or for the next word?


4 years ago was a very different time. Software quality in general continues to drop. I’m not sure if that’s due to increasing incompetence in the software engineering workforce (unlikely, but possible), malice (more unlikely), or apathy (most likely).

When wages don’t grow over 10 years, what incentive is there to write the best software you can?


I suspect the cause is a little more subtle (and terrifying).

The majority of people don't give a shit. Correct spelling and obscure searches are not even on their radar, it's not a part of their reality. Don't let the comments here fool you -- it is a very specific, picky, technical crowd that frequents HN.

The voice of "those who care" has always been a minority, although it used to matter more, simply because people who care and worry and try to do a good job tend to have more power and money (conscientiousness is a great predictor of success), and so businesses cater to them more. Now that everything seems to be turning more uniform, more global, more binary, more equal, that voice is marginalized (good thing? bad thing?) -- you're seeing the effect of a hoi polloi stampede.

So it's not the fault of "incompetent programmers" -- it may be a trickle down effect of our social incentives and economic trade-offs.


Yes, this the real reason. It's a variation on the Tyranny of the Majority.

https://en.wikipedia.org/wiki/Tyranny_of_the_majority


> The voice of "those who care" has always been a minority

To add to this, people who cares most powerfully had probably switched to alternative, open source, software solutions. This leaves the remaining group with less "care" on average so fewer would complain. Kinda like evaporative cooling.


> 4 years ago was a very different time. Software quality in general continues to drop. I’m not sure if that’s due to increasing incompetence in the software engineering workforce (unlikely, but possible), malice (more unlikely), or apathy (most likely).

I think another dimension is how deployability has changed.

Before... When you wrote and shipped software, getting your software out was a big problem, a big deal. This also meant that if you shipped a bug, shipping an update would be equally expensive (for you and your customers), and the amount of goodwill you lost would be quite tremendous.

Now everyone has a appstore, always up to date apps, and whatever else is usually "in the cloud" somewhere. The time of people installing applications in a normal desktop-context, with installers and having IT-administrators handle updates once every second year is surely long gone.

With that kind of change, and an increased focus on delivering early, doing proper QA is no longer something which is rewarded in the market.

Who cares if you made a bug-free, awesome service, when you did it 6 months after someone else shipped a similar, but buggy service which everyone is already using? They have established a user-base and as such already has social momentum and lock-in.

What do you have to offer which is not only fantastic enough to make some bother migrating, but also so amazing that these people will also go convert their friends and families? "Less bugs" alone is not going to cut it.

Basically, taking the time to deliver quality software these days is increasingly something you get punished for in the market-place.

The result? We get shit like this and we can only blame ourselves.


This will only get fixed when software starts to get liability and refunds.

If enough people start suing or asking back for their money, companies will surely improve their QA.


I find it more reasonable to assume that you can only improve a very specific function so much before your "improvements" turn to "pointless sidegrades".


Well said. I have seen a lot of sidegrades over the past 10 years!


I don't know how easy it is to switch keyboards on iOS, but its a great idea on Android.

I use Minuum - looks whacko at first but let's me use way more of the screen when replying and it is really accurate!


Easy bit wonky swift key is probably the best


you can turn off autocorrect


It would be better if you could turn off some autocorrect features and leave others. It has a lot of facets, and autocorrect is great when it works.


And they can't even fix a simple typo like this after years of usage - "th8s 8s example tex6". Come on Google, you made that UI, you know that i and 8 are next to each other, you have a database of correct words and probably of typical errors and typos, wtf?! (you even know how to correct "wft" to "wtf" and can correct simple word with number typo).


Yes! Those one-off errors used to be fixed by most autocorrect systems. Seems like whoever wrote the new AI-based systems didn't make a checklist of existing features and attempt at least parity before switching over. It's embarrassing.


I don't know; I'm pretty happy that I don't have to type apostrophes any more. I can just go on typing words like "ill" and "wed" and it'll figure out after a few more words that those should be "I'll" and "we'd".


What the parent comment is talking about is something more extreme and I've noticed it too. It sometimes changes prior words that are valid after you have moved on to the next word. It's not correcting the word you just typed, it's correcting a previous word without any sort of feedback like you get for normal corrections- it's going backwards and changing earlier words, and then you try and fix it but the exact same correction applies automatically again over and over.

Unfortunately I can't remember any examples at the moment, it's just something that happens to me every so often. They're really irritating though, because they aren't well expressed in the current autocorrect UI (which works on the current word) and it doesn't seem to get the hint when you go back and correct it, so it keeps applying it over and over.


No, that's what I was talking about too—it needs the context of the (part of speech of) the next word to figure out in the cases I mentioned whether I actually intended the (correctly-spelled) word "wed" or the (correctly-spelled) word "we'd". It doesn't change it until after you hit space to commit the word that comes after the "wed" input.


Ah, ok. I'm pretty sure I've had it happen with more than just contractions though, like with common phrases. The real problem is that it's really hard to undo the correction. I need to start keeping better track of it so I can file a bug report... it makes it really difficult to type certain combinations of words.


If I type "its" and "it's" correctly, they are regularly changed to the incorrect spelling. If I don't bother, they are not corrected. The only way to get correct "its" and "it's" is to go back and fix them after auto-correct has screwed them up.


It's probably because the spelling is corrected based on a machine-learned model, whose corpus is likely to contain many instances where "its" and "it's" were swapped.


Which means the corpus is broken. But regular people rarely care about correct spelling, in my experience, and so I doubt corpus maintainers will care either...


You're imagining that people who make NLP corpora actually vet the text going into them? I dream of a world where people can be convinced to care that much. I'm not even talking about the scenario you suggest of filtering for proper word usage, I'm talking about filtering at all.

The corpora used for popular word embeddings are full of weird nonsense text (in the case of word2vec) or autogenerated awfulness like spam and the text of porn sites (in the case of fastText) or both (GloVe). And most people who implement ML don't care how their data is collected.


I mean, nobody expects the engineers to manually read through everything, but if the quality of the input text is significant for the quality of the autocorrect (or whatever other application you're using machine learning with), you kind of have to make sure the input is pretty good... You could for example choose datasets which is expected to contain mostly correct grammar and spelling (such as Wikipedia, books, etc.) rather than datasets which is expected to contain mostly incorrect grammar and spelling.

Or don't use a machine learning model. I honestly don't care, just don't automatically turn a correct "its" into an incorrect "it's".


Wouldn't late edition books with only corrected text be better, proofread, edited, proofread, edited, ... Google have millions of them they've assumed copyright of. Surely there's enough text there. Do they really just use random website text?? Nearly every news story I read has errors and they have style guides, trained writers, editors, etc..

Do publishers sell their published text as a mass for use in AI/ML? Like 1000 books, no images or frontispiece, etc., possibly jumbled by sentence/para/page.


That is one of the most draining things I've heard recently. Sigh. Are these open projects where someone could in principle improve them?


The best open corpus project I know of is OPUS: http://opus.nlpl.eu/

They say they welcome contributions; I don't know if they just mean new sources of text, or if this includes code for filtering or fixing their existing ones.


I've been trying to figure out how to disable that! Anyone have any ideas? Google search turned up no results, but after this thread I shouldn't be surprised.


I use Swift keyboard on Android, never had these kinds of issues, its predictions are remarkably good.


And it's guaranteed not to have postdictions?


I've never seen it changing words after you have started on the next word.


Nope, the autocorrect works until you press space. It doesn't go back and "fix" the word you're already finished with.


that's not Android keyboard. That's Google keyboard. Solution: use a different keyboard. I've used the AOSP keyboard since I started using Android. Sure it's made by google but it does what I say rather than what some proprietary idiot robot thinks I think. While your'e at it, use the send feedback function to let them know you hate it.


> that's not Android keyboard. That's Google keyboard. Solution: use a different keyboard. I've used the AOSP keyboard since I started using Android.

How to I try the AOSP keyboard? I have a Google Pixel and it looks like my only built-in option is Gboard. I'm guessing they removed it.


Its annoying autocorrection tendency to choose 'fir' instead of 'for' frustrates me. People almost never use the word 'fir', but use the word 'for' often. It would be nice if you could blacklist words you want it to never choose.

There is SwiftKey, on the hand, that does those kind of annoying corrections a couple of times, remembers your choice, and does them no more. It's been a long time since I've seen a 'fir' with SwiftKey.


I've definitely noticed that but I'm not sure the reason is so innocent


That's simple to turn off. At least on my Samsung.


Please teach me. It's driving me crazy.


On my S8 keyboard the configuration menu can be opened using the gear icon appearing in the keyboard. In the configuration menu there should be a bunch of options under "Smart typing" including 'Predictive text' which can be turned off.

This is in the european market, though, I don't know if it's configured differently for different markets.

On google the query "samsung turn off keyboard autocorrect" provides links such as https://www.androidcentral.com/how-turn-and-autocorrect-sams... which may or may not be relevant to you.


Here's the problem. I already turned off all those options long ago, but the annoying behavior remains. To be clear, I don't mind the keyboard predicting things. It's only the retroactive changes that bother me.

Someone else suggested Swift keyboard, so I'm going to give that a try.


I was amazed by the Gboard thing though. It usually picks the correct phrases for me.


I think the biggest irony is that the web allows for more adoption of long-tail movements than ever before, and Google has gotten significantly worse at turning these up. I assume this has something to do with the fact that information from the long tail is substantially less searched for than stuff within the normal bounds.

Google wants you to be mainstream now. If everyone thinks the same and wants the same things -- even if contextually as a member of one of a couple of hundred disparate "marketing cohort" categories -- it will be far easier to target advertising to you. It's in Google's interest for you to conform now. Be easy to categorize. Be easy to predict. So think the same as the members of your peer group, so they can sell hyper-targeted advertising to other corporations. (Have you noticed that social media tends to motivate you to conform?) Google has no use for the long tail anymore -- no use for quirky and inscrutable scenes and subcultures. Instead, it now has the cultural power to transform you (the product) into an even better product.

Remember: 1. If you're not paying for the service, you are not the customer. You are the product. 2. Given sufficient quantity and concentration, Power corrupts. Always.


I see quite the opposite inceintive for Google. If you are a very eccentric individual and they know those quirks, they have a huge competitive advantage in targeting ads to you vs. some bulk radio broadcast ad etc.


I see quite the opposite incentive for Google. If you are a very eccentric individual and they know those quirks, they have a huge competitive advantage in targeting ads to you vs. some bulk radio broadcast ad etc.

However, the observation of this article, and my observation as well, is that Google isn't currently capable of parsing very individual quirks. Rather, Google is able to place you into one of a number of highly conformist boxes. They don't have to understand you as an individual. They just have to 'box' you more effectively than their competitors.

There is nothing in the market or otherwise emergent in the nature of data and such categorization which fundamentally motivates Google to be able to parse anyone's quirks or understand the essence of a scene or artistic movement. If Google can gain a competitive advantage by creating a number of honeytrap doppelgangers which draw people away from the long tail and sequester them into un-creative, imitative, and highly conformist boxes, then so much the better for them.

https://www.youtube.com/watch?v=R9OHn5ZF4Uo

http://lesswrong.com/lw/l8/conjuring_an_evolution_to_serve_y...

In much the same way, I find that recommendation engines come up with annoying pale imitations of bands/musicians I like. I also wonder why authoritarianism seems to spread so effectively across social media, and why certain authoritarian movements seem to get such ready support from within Google and various social media companies. It's because, as a product, conformist/authoritarian screechers are more easily herded, replicated, categorized, and packaged than real individuals who think for themselves and apply principles.


If you liked Radiohead, you must also like Coldplay!


"I assume this has something to do with the fact that information from the long tail is substantially less searched for than stuff within the normal bounds...It seems like the web is being optimized for casual users"

In other words, the internet is becoming more of a collective and caters less to the individual.

If you have a special interest, then who cares? You should just adopt more normal hobbies. If you have a unique political viewpoint, then get over it and join one of the major parties. If you are oppressed, then it's fine as long as 95% of the people are content (don't worry, we'll carve out a few protected classes so that the pictures still look diverse).


It is much worse. It contextually averages everybody, so if 95% of the people are oppressed, but always on different ways, it also doesn't matter.


> It seems like the web is being optimized for casual users, and using the internet is no longer as skill you can improve to create a path towards a more meaningful web experience.

No. _Google_ is being optimized for casual users, and using _Google_ is no longer a skill you can improve to create a path towards a more meaningful web experience.


Yes. The problem seems to be compounded by Google's intention to correct spelling, and (for the long tail searches) to assume you're really just looking for whatever everyone else is searching for.


Yes! You need to beat Google with a stick to enable searching for the furce awakens (that's not a typo) poster which Disney itself produced. Used to be that quotes around a phrase stopped this typo correcting but no longer.

BTW. this is my go-to example for the "piracy is a service problem" -- this is 100% Disney IP for the fans of two billion dollar movies and you can't buy it as a poster. https://images.moviepilot.com/image/upload/c_fill,h_470,q_au... So I went on eBay, found a custom poster printer service and got it printed and shipped for 12.48 USD: https://i.redd.it/x4mmmvayunbx.jpg I would've been glad to pay double, triple for an official version but no. You just can't buy it.


Huh? I just searched for "the furce awakens". And got:

    Showing results for "the force awakens"
    Search instead for "the furce awakens"
And hitting that got me the Zootopia stuff. So?


CHX is still right in what they say. The point here is that Google override peoples searches based on some random fuzzy logic, and whatever you've searched before, regardless of whether you are logged in or not.


To be fair, >90% searching for "furce" probably meant force. For the rest it's one click more.


I can understand "firce" and "fprce" or even "f9rce" but "furce" is, on a standard QWERTY keyboard, two keys away so more unlikely to be a typo.


Do Google do spelling correction based on letter locality on the expected user keyboard? Never seen any corrections that would suggest that, often wondered why not.


There used to be a service that would, given a query, search eBay for listings that matched it or common misspellings based on nearby keys. I wonder if any of that logic has made its way into modern autocorrect explicitly or if it would be gathered implicitly through studying what users actually correct.


But this is the wrong way around. At least make it an option to offer the typo correction instead of forcing it on you. Think of how much money Amazon made with one click. Two clicks is twice the clicks. Focus on fast and relevant searches not trying to guess what I meant. It's great to have a typo correction on offer but let me decide whether I want that to be default -- I don't. I know what I am searching for.


I agree that having that option would make sense. But arguably the default should be correction, because most users are probably making typos.


Fine but I am logged in, if it were an option I could fix it once and carry on.


Yes, I totally agree.


It doesn't help that it seems every other project out there is trying to name itself after common words in the lexicon. I dare you to try finding information on the Box library.


This one? http://www.wiltshire.gov.uk/librarylocations.htm?act=show&li...

It seems that context-sensitive search is a curse as well as a blessing. Wikipedia at least offers you disambiguation pages; perhaps a search engine should let you pick a "domain" to prioritise search in.


Didn't Google used to do this, one of the SE did definitely - they'd group results into subject areas.


https://ccse.lbl.gov/BoxLib/

Is it this thing at the very top of the search results?


Fwiw my first page had nothing to do with anything computer related, except for the last result, which was how to make dialog boxes in Windows.

First result: https://littlefreelibrary.org

Search: Box library


Moreover, it was neither of the 2 Box libraries I was referring to. So that makes 3.

https://www.box.com/

https://pypi.python.org/pypi/python-box/


> the AI is probably incentivized to give a correct response to as many people

I'd bet it's probably trained to maximize clicks on ads...

Just like youtubes algorithm is trained to maximize viewing time (and thus ads you see), instead of showing you videos you'd actually enjoy more.


> trained to maximize viewing time (and thus ads you see), instead of showing you videos you'd actually enjoy more.

How would youtube quantize how much enjoyment you're having, if not by tracking viewing time?

As an aside, when watching youtube on my TV, I don't think there's a way to thumbs up or down videos. Even on my PC, when the video is in fullscreen, there's no way to thumbs up or down either.


Let's not forget that Google is an ad company, operating a search engine for that purpose.

I bet long-tail queries, while capable of carrying very targeted ads with a high CTR, are just too rare and thus less profitable. Likely much more people have basically no clue and formulate queries approximately, and very few query precisely in improbable ways, so getting the majority to the results they "meant" gives better financial results.

To tell the truth, you still can enter the "verbatim mode" using a menu, and try to find that improbable cluster of words. It's an advanced feature now, requiring a bit of digging, but it's there.


I'm not sure I've got a fully formed concept here, but I'm throwing it out in case someone finds it interesting.

Re: long-tail movements and switching contexts between work and home, I wonder if a better example isn't much better user or persona management. Every person is interested in more than just one thing, and can conceivably be looking for the same word in two contexts. Down below, a multi-lingual speaker has given up on predictive keyboards.

What if we could enable users to switch personas/contexts as intuitively and easily as people code-switch in real conversations? Setting up the profiles would be messy and cumbersome at first which probably kills the idea dead in my hands. I'm not knowledgeable enough about psychology or machine learning to figure out if that could be solved automatically.


One of the difficulties I see with this approach is to identify personas.

It would be fine it was work + home persona. I feel though that it woild end up like work local branch + work parent company report + work programming + home family member + home personal hobbies + home grand parent’s health

For context I already have two IDs for work and private stuff, I still hit a lot of barriers on Google search and ended up in ddg, using the location switch when needed.


A bit on the snarky side I realize, but someone at Facebook has probably already dug into this. Whether or not they'll admit it or discuss it publicly is far less likely.


> I can only imagine this has something to do with their increasing reliance on AI, and the fact that the AI is probably incentivized to give a correct response to as many people'above the fold' as is possible. If 95% of people are served by dropping the specifically-chosen search term, then the AI probably thinks it's doing a great job.

There has to be an additional reason though, otherwise they could've just put the AI "enhanced" above the fold, and append the actual accurate results below it, for the people doing research.


You can add a plus (+) before a word you require to be in the search results. Eg. "+VBS array loop". You can filter out word by prepending a minus (-). You can surround exact phrases in quotes (") and you can even allow synonyms by prepending a tilde "~"

Example query: +vbs array ~loop "magic number"

If Google does not find anything it will remove some parts so that you will get any results at all


Unfortunately Google dropped + when they introduced Google+ social network. Nuff said Google+ is a ghost down nowadays but the + operator hasn't returned yet.

And even written "search term" doesn't work consistently anymore, Google thinks to know better and semi-randomly omit words and show results no one asked for.


Didn‘t they drop the + operator when they launched Google+? I thought now you have to surround your term with „“ instead of +?


Yes, you're right, they did this a looong time ago... BUT, sometimes Google still insists on showing you results that don't contain the word or phrase that you've explicitly requested must be in the results - utterly infuriating!


You now have to go into Search Settings -> Verbatim. Unfortunately you’d have to do this for each search session as it doesn’t persist.


In Chrome, I've set up Google Verbatim as a search engine and made that the default address bar search. Here is the URL format: {google:baseURL}search?{google:RLZ}{google:acceptedSuggestion}{google:originalQueryForSuggestion}{google:searchFieldtrialParameter}{google:instantFieldTrialGroupParameter}sourceid=chrome&ie={inputEncoding}&q=%s&tbs=li:1


Especially when that word is 'not'.


Yeah, you are right; https://support.google.com/websearch/answer/2466433?hl=en

They even dropped tilde :(

But you might be able to achieve the same things with their advanced search; https://support.google.com/websearch/answer/2466433?hl=en


That stopped some time and and it's now extremely frustrating as the main article describes. I'm sure I read that putting the search in quotes is like the old + but it doesn't seem to be, rather just priority to that search.

I feel this is more a case of Google thinking it knows better.

I have a similar problem with ebay, if I'm looking for an "HP Z430" then I'll get pages of other items where Z430 isn't even in the description.

I can see some value in searching for alternative spellings and related items but there should still be the ability to be exact in your search terms.


This is [one of] the dark side[s] of the "adaptive" or "personalized" Web. The more adaptive and less deterministic, the less collaboratively and reliably useful it is, even if it happens to be better at serving attention-monster momentary gratification of the meme du jour.

The reliance on advertising revenue models means that all such Web properties morph into being essentially adversarial attention traps against users.

I would pay $50/month in a heartbeat for access to a no-ads, deterministic, guts-openened-with-API Google type engine (even if rate limited at that price to some high-human volume of usage).


> pay $50/month in a heartbeat for access to a [...] Google type engine

Never. And definitely not at USD 50 per month. That is a huge amount for this, although I suspect you're a pretty rare customer and/or exaggerating. Broadband or Mobile service, Satellite TV, etc. all have packages that cost about this much. Even a magazine subscription is a fraction of this amount, Netflix is only about 5-10 dollars a month, isn't it? I could see people paying 10% of the Netflix (ad free TV) charge for an ad free search engine maybe so USD 0.50 to 1.00 per month, 1% of your suggestion...


What annoys me the most is that Google seemingly no longer allows searching for phrases at all. In the past I could search for a specific phrase like this..

  "It never rains on a Wednesday in Rockshire"
..and Google would return websites containing that exact phrase. That no longer works. Nowadays parentheses are largely ignored as far as I can tell. This is super annoying because quite often I am really looking for a website/websites containing a specific phrase.

It seems they internally switched to indexing single words only. So if you search for a phrase Google will instead return websites containing (some of) the words in your phrase in no specific order and maybe not even next to each other.

I think I understand why: Indexing / searching based on words is massively easier and massively less resource intense than a system which can search for specific phrases.

Furthermore the type of searches Google wants/expects you to do e.g. "best hairdryer" work well enough, in fact work better if you only search for single words and then filter / organize the results using AI / using information you have collected about the user.

EDIT: I was wrong. It still does work for many phrases, just not the ones I tend to search for. See below.

EDIT 2: I actually have no idea what is going on there. I know that searching for exact phrases doesn't work for me like it used to but I have no idea why..


Quote marks still work.

Google tries to find your phrase, and fails, then tries to give you results with some of your words missing without telling you.

If you search for "It never rains on a" with quotes it will only show you pages with that exact phrase. If you search for that without quotes it will show you a messy group of results based on what it thought you meant.


No, they don't but I know what you are referring to.

Most of the time I get the result:

  No results found for [phrase]
  Results for [phrase] (without quotes)
..but the thing is that I know websites containing [phrase] exist. Often many of them and they aren't "dark web" either. Google used to be able to find them but no longer is.

This gets more confusing because sometimes it does work. Namely if you look for phrases which are popular search queries e.g. if you search for..

  "hit me baby one more time"
..it will work amazingly well. But for other phrases it won't work at all and you will get the no results reply instead. It used to work for all phrases independent of their search popularity.

But yes, my original post was wrong. It does work for a lot of phrases.. but not for others.

I guess Google's engine dynamically optimizes things, and only indexes often searched phrases.

EDIT: Okay, this assumption was wrong too. I did some experiments to confirm my theory and the results showed that it is wrong..



This is what it looks like to me: https://imgur.com/zre6HdT

Probably this is because such pages do not exist, except for this page. Or at least DDG thinks so: https://duckduckgo.com/?q=%22It+never+rains+on+a+Wednesday+i...


Indeed a very nice way to prove this thread wrong :-)


I had also wondered whether its ability to match exact phrases has degraded. It would make sense, since keeping individual words on a distributed index is a lot easier than keeping a long phrase. But I had no way of telling, not having benchmarked it in the past.


Can they not index bigrams or trigrams, then chain together index hits? E.g. "It never rains in december" would hit on "It never rains", "never rains in", and "rains in december". Any result that hits on all of the indexes is not guaranteed to hit on the entire phrase but it would be a good candidate for the top result. The longer the phrase, the more likely a candidate hitting all necessary index phrases would match the exact phrase. This would at least put a limit on how large the indexes need to get.

Further, if they retain copies of the full text in their database they could do a filtered scan of the documents that hit on all subphrases to guarantee exact match. I could see that having too much of a performance impact at scale though.

In any case the dumbing down of Google search over the last few years is immensely frustrating to me.


> Google would return websites containing that exact phrase. That no longer works.

Er, what? No search engine finds "It never rains on a Wednesday in Rockshire". Results for "It never rains on a Wednesday" with exact matches

Google - 6 pages

Bing - 3 pages

Yandex - 0 pages

Duck Duck Go - 3 pages


That was just a random example of a phrase. I made it up on the fly.


Not sure what you expect then.


He isn't complaining that the specific phrase "It never rains on a Wednesday in Rockshire" to turn up any results. He used it as an example of the kind of phrase which, even though they exist somewhere on the web, won't be found with Google.


> Er, what? No search engine finds "It never rains on a Wednesday in Rockshire".

Now:

Google: https://imgur.com/a/tbOzE

Yandex: https://imgur.com/a/LB2qx

DuckDuckGo: 0

Bing: 0


Well now it does. Strangely, for Bing this thread is third on the list but does not show any of the searched for text in the text blurb like the other suggestions.


I also get very angry at Google trying to stop me from making advanced queries. Even if I only chain a couple of Google's very limited operators together it shows me captchas and after a while the captchas do not stop. I keep solving them and it just wants more.

Lately I've also been noticing that the behaviour of some google search operators are broken. " something " "otherthing" is not considered as an AND " something " OR "otherthing" is not considered as an OR. Google shows me the results it wants. I recently tried to research FreeBSD and Meltdown (I tried many times: "FreeBSD" "Meltdown", "FreeBSD" AND "Meltdown" etc.) and almost no result involved the terms both FreeBSD and Meltdown. The interesting thing was, Google did not say there was no results matching my criteria, it kept showing me popular IT news, linux news etc. It was extremely frustrating.

The only operators that work are site, date and filetype. The logical operators do not work reliably or do not work at all.

If you're searching for something popular google finds it best, but if your search patterns are deviant google ignores you and even thinks you're a bot and refuses to service you.


I'm enormously frustrated that Google has been killing off advanced-search tools.

The loss of '+' is annoying, particularly since quoted and non-quoted mixtures are unreliable. Searching on "freebsd" "meltdown" might have solved your problem, but it's too unpredictable to be sure. My experience suggests that Google is doing something with site-level search, such that a site with only 'freebsd' won't appear but a site with 'freebsd' and 'meltdown' on different pages still might.

'-', meanwhile, seems to simply be disabled some of the time.


I'm primarily a DDG user, and I get annoyed that it often drops terms from my query as well. However, if you prefix the term with a +, such as `+freebsd +meltdown`, it will reliably keep that term in my query and show me what I want. So my queries often get overwritten, but at least I can reliably override it.


I'm glad I'm not alone in thinking this way. I wish I could use the Google of 10 years ago. Pre-personalization, pre-deep-learning RankBrain, just the reliable and consistent search engine where complex queries will return the result I want, if it is there.

Today, if I'm searching for something unpopular or specific, I usually get frustrated. You would expect the opposite to happen as the size of the web should increase over time.


Using the Google of 10-years-ago on the Internet of 10-years-ago, or using the Google of 10-years-ago on the Internet of today? Because I imagine doing the latter would just result in an endless wasteland of SEO-optimized content leadgen pages.


The good thing is SEO optimized content is relatively trivial to distinguish from the snippet, and other search engines like DuckDuckGo feel far more reliable and consistent without being overrun by SEO junk.

You do have a point however.


Not really, not for most people, especially for farm sites like eHow. If you ask how to make a tequila, you'd get an SEO-optimized eHow site instead of say, an authoritative page of the world's top tequila expert.

If DDG was so good, people wouldn't need to use !g for tail queries so much. Too much anecdotal claims every time these issues come up and no objective quality evaluation.


I mainly use !g for the "stupid" queries, actually. The ones where I would actually prefer Google's AI second-guessing me. Also "local"-type searches. I don't voluntarily give Google my location[0] but if I type the city name too, it works fine. Actually I just tried and DDG does it just as well and gives a map and address too, so I may go for that next time.

Except for lyrics, DDG is quite good at that and often even presents the proper "zero click result" straight away (though usually the lyrics are cut off at a point and I still need to click).

On the other hand, the vast majority of my DDG queries are !bangs for other sites, because I know what site will have the page I'm looking for. Usually !w for Wikipedia (and the other wikipedia stuff like !wnl and !wt), after that probably one of the image searches !gi/!yi/!bi, then !imdb, !discogs and !whosampled. Oh and occasionally !hn, of course :)

I believe that DDG would have had a lot harder time getting as successful as it is today if Google had retained its old "AND" search engine behaviour (as explained above, keywords used to have an implied "AND" between them).

[0] I only log in to these types of big "social" things using a private tab, for GMail and to get my personal YouTube suggestions and subscriptions. It's a bit of a hassle to use the 2FA Authenticator code every time after I closed my browser, but it's worth Google not tying everything I search to my account, or getting "bubbled".


> If DDG was so good, people wouldn't need to use !g for tail queries so much.

When I resort to !g Google usually returns nothing interesting either and the most promising links are usually marked as visited, since I already clicked them from DDG.


I'd venture that most people are happy with the eHow results. Those wanting more depth may have to look through 1-2 pages of results, but I don't think most people want/need "tequila expert" level depth when searching for that. I'm not saying this is correct or "desirable," just saying that's one justifiable explanation for this behavior.


SEO penalization is orthogonal to the search algorithm.

The simplest solution: eliminate SEO'd pages from the index before applying search to it.


Something tells me the solution of removing "bad" pages isn't as trivial as you seem to imply it is here...


The important point is that it's orthogonal to search.

You can make SEO-penalization as complicated as you want without affecting search. The only thing search should be able to deal with is a SEO-penalty which is just a number.


Is your point that SEO-penalization is just a sorting implementation concern? I think Google does far more than that - perhaps removing/banning sites or categories of content that are deemed to be gaming their system.


No, I was replying to what derefr said.

And removing/banning sites can be done by setting the SEO-penalty to infinity.

I'm not trivializing anything. All I'm saying is that the concerns can be separated.


Either would be pretty cool :)


I wish I could go back to the Google of 1999. PageRank was magic before people learned to game it. The results were incredibly good.


PageRank was magic before people learned to game it

Even then, before Google started going after them by filtering results, the SEO spam sites were pretty easily recognisable and ignorable as you scrolled through the results. Now, they still show up in droves (try searching for service manual PDFs and you'll instantly see what I mean) but you hit the end of the viewable results far too soon to find the useful stuff buried in later pages.

In other words, Google's ranking now seems to be "good SEO'd sites > spammy SEO'd sites > everything else", and cuts off results before getting to that third category, when that third category should ideally switch places with the second and maybe even the first.


RankBrain

Amusingly enough one of the definitions of "rank" is, according to Wiktionary, "having a very strong and bad taste or odor"... as in the smell of a decaying brain. How fitting. I almost wonder if it's deliberate. If they called it BrainRank (like PageRank), the adjectival meaning seems to be emphasised less.


The emphasis would be incorrect if the term were reversed - RankBrain is a Brain for Ranking like PageRank is a Rank for Pages.

No doubt there is someone in Google, however, who has a project for ranking AIs called BrainRank for their own amusement.

Anyway, your observation is also an amusing interpretation.


PageRank is called after its inventor Larry Page. It's just another case of Nominitive Determinism!


An excellent point, had totally forgotten :)


I think part of this has to do with googles actual customer base: Adwords users. The goal of Adwords is to have every single search be worth the maximum possible value in an advertising sense. So if one person a month searches for "Office desk drawer repairman Concord California" they do not want to service that very detailed long tail search, that would minimize value for themselves and maximize it for the advertiser. They want to treat that search in a vague sense so they can display more ads for their customers and charge for each display and click.


You know, that actually makes a lot of sense. Recently, I attempted to bid on some long-tail keywords in AdWords for some targeted ads, but unfortunately they don’t have the “search volume” to qualify.


You mean you cannot bid on keywords with very low search volume ?


Yes Adwords will prevent you from targeting low volume/long tail combinations and recommend higher volume more vague and simple keyword/combinations.


That seems to confirm a feeling: Google (et Al) transformed the internet into the new TV


It’s a bit funny and sad how Google has both managed to seem overly clever to the point of being clumsily useless, and at the same time, continues to offer the original search experience that people pine for but no one knows or can be bothered with: add “intext:” before keywords and it’s the experience you’re wishing for.


I just tried a few of my old "dry" queries (as in, things I've always wanted to look for but can't seem to get good results) and didn't notice much difference... Furthermore, after a total of 3 "Next Page"'s, I got IP-banned with a CAPTCHA.

I've had similar experiences recently with "site:" and the other colon-operators, so not entirely unexpected, but still immensely infuriating. It's almost like any real attempt to find what you're looking for, if it's rare, will result in punishment. :-(


I start feeling like the web is being de-optimized for nerds & super-users


Nothing wrong with that. You generally shouldn't optimize applications for power users. Software should be usable.


Good software is when both kinds of users are pleased. As Alan Kay said "Simple things should be simple, complex things should be possible.". But it's probably the hardest thing in UX design to make something that is simple and powerful at the same time - most things sadly end up being only one of the two.


I agree, but I didn't say that good software doesn't please its users. I said you generally shouldn't optimize for power users and that software should be usable.


Its likely you are using "privacy" plugins in browsers that obfuscate who you are to Google. "A friend" of mine has the same issues with them (constantly harassed like they're a bot).


Never heard about that before; thanks for sharing.


> will try to rewrite your queries and omit words

This has been bugging me too the last couple of years. Sometimes I would actually prefer an empty answer to one where various words are missing. Empty can mean an idea is still unexplored.

I've started sending feedback to Google every time this happen. Maybe it'll make a difference if more people do it?


Searching using “Verbatim” can be helpful.

What’s particularly infuriating is when searching with verbatim still ignores keywords.


I still remember the day that stopped working. I decided that was the day Google died.


Is that when the + command was overridden in search to refer to G+ instead of 'this is a mandatory search item'?

I am still shocked they ever thought that was a good idea, or that surrounding quotes was an acceptable replacement.


No, when that was changed you could at that time just quote a single word to get the old +word behavior.


Years ago when I interviewed for a Software Engineer position at Google, the person I was matched to eat lunch with between the interviews was from the Search team.

I asked him these exact questions. He said, the last time he checked, quoting a single word to mark it mandatory worked for him and that he definitely would know if it didn't.

I didn't insist much at the time but I knew he didn't know what he was talking about, and it made me lose hope that this feature would ever come back working like it used to.


Are you saying the quote-a-single-word thing doesn't work for you?


It has not been a guarantee for several years, now. At best it seems to be a slightly firmer suggestion.


Do you have some example searches?

I believe you, just never seen it myself. Perhaps I switched some obscure setting on years ago.


None that are guarantees, unfortunately. It's maddeningly inconsistent from day to day. But it isn't that rare, either. I only perform 10~20 searches using the quote feature a day, and I hit a case where it's ignored several times a week.


Doesn't work consistently for me.


I've reported that at least once.

After that it started working again for me within a couple of days and has been more or less working as expected (hmmm. That's what I thought at least) until recently.

Frustratingly they refuse to send any feedback so I just had to try again,

Yes, and I've also gotten the capchas.


What more infuriating is it using synonyms of "verbatim" that actually don't mean the same thing as verbatim in a particular technical context but match an avalanche of noise. I've yet to find a way to turn on synonym matching.


I've also noticed Google omitting the only relevant and crucial keyword in my search over the last year. The First result will consistently show a result that has that keyword crossed out.


> solving the CAPTCHA just gives you another, and no matter how many you solve it keeps refusing to search

They even have a patent on that: https://www.google.com/patents/US9407661


I fear that something like PageRank used to work well just because back then many individuals had personal web pages with well curated links. It seems like this has slowly gone away, and now Google et al. have to resort to increasingly more hand-tuned AI extravaganza to get the search to work decently well at all.


Ironically this sort of "links section" probably dropped out of favour in large part because you could find those pages easily through Google, which knew which ones were good because they were linked in a lot of peoples' links sections...


Also, it gets ever harder to keep up over time, as good links either go away or deteriorate in value.

Perhaps not coincidentally, both the Yahoo Directory and DMOZ have been entirely shut down.


I'm so glad I'm not the only person thinking that. Last year and a half or so I've started scouring forums and reading studies on google scholar rather than searching the web. And it's gotten worse and worse each year.

I've been lazy and so acclimatized to the UI that I haven't changed, but I plan to now. It's especially difficult finding medical information. Mind boggling how 6-7 years ago it used to be so much better.


The worse is when it gives you results containing synonyms of your query words (and even highlights them in the little description under the link). Like, dude, I used this word for a reason


All this is true, and yet when I use Google Chrome's search autocomplete (which is partly server-side and partly client-side), it will usually fill in exactly the query that will get me what I want (rather than what query I think will get me what I want) before I even start typing it, using some combination of searches I've done before and the indexed text of open Chrome tabs.

Google by itself isn't very good, but Googling from the Google Chrome address bar these days is something else.


I wonder if what you're seeing is a result of the Internet growing, not specifically Google degrading. Which is to say, as the indexable dataset grows, you'd expect the individual indices to turn up more and more heterogeneous results over time.


Nice theory, but it's Google degrading. They actually removed features for more accurate searching. Or they arbitrarily ignore them--all the people in this thread saying that "word" now functions like +word used to, are mistaken (or didn't use it much in the past decade) because often it does not.

And you didn't even need to use +word for most words, because it would only give you results that hit all the keywords in the query. You only needed the + operator to include "stop words", a relatively short list of common words that Google would filter out.

Also, given that personal webpages are way less popular than they used to, the Internet might not even be growing that fast, at least not the parts that contain the type of random useful info that we used to be searching for. They can't index Facebook (not very deep), or most of the popular mainstream services that people write their thoughts into.


The fact that other engines are able to offer up results tells me this is a Google thing, not the size of the internet.


... unless there's something Google offers up reliably that those other engines miss. Then what you're observing is heterogeneity of returned results due to the dataset being too large for any one index to cover it fully.


It could be the internet (content) growing, or it could be the search engine changing, but to me, the most salient change over 20 years is the internet user base changing. At the risk of sounding elitist, the users of the internet have become a lot dumber on average. Search engines therefore grew dumber relative to the one that was tuned to people who were early adopters and used the tool for more productive purposes.


With respect: yes, that does sound elitist. I would avoid assuming early use cases were more productive than subsequent average use cases (for one thing, it risks defining "productive" as "stuff I like to do," not via some more objective metric like "revenue generated," "human needs satisfied," or "questions answered").


On the objective side, most revenue today is from advertisement and ad-generating content. Search engines make it easy to find ads, essentially. It's not about the content itself any more. So there is that.


Yandex is pretty good but I haven't found a use case for DDG yet. Whenever I set it for "privacy" reasons (not that I really believe them), I have to go back to Google to get reasonable results. I can confirm the memory loss, though, ego-surfing confirms it for me. Much of my older stuff is gone and all the top links for my name are related to my current work. (So in my case, it's kind of beneficial.)


That's because you're not using DDG's !bang syntax to its full power. I have it as my default address bar search engine so I can redirect straight to Wikipedia (using !w), Discogs or whatever of the 1000s of !bang abbreviations are defined.

Just the amount of times where you know the most useful (and probably top) result on Google will be Wikipedia makes DDG worth it. Saves a click. But for me it saves a click very often.

If I need Google occasionally I just append !g, but it's just one of the many other places I use !bangs for finding stuff.


But browsers already have that functionality built in, why go through DDG for that? (well, Chrome and Firefox do, not sure about others)


If others = Opera, then yes, they came up with that first :)

I've been using those kinds of keyword search shortcuts for years before switching to DDG by default.

The realisation was that having 100s (maybe 1000s already) of easily-guessable search shortcuts pre-programmed with DDG's !bang syntax is way more useful than having top configure 10-20 of them by yourself (and forgetting most of them when you reinstall a browser). Definitely worth the extra "!" keystroke and redirect :) [especially given DDG doesn't track you via that redirect].


I concur - and then to run across a google aficionado who alerted us to a policy change where now paying to move up the charts was cutting them out and the paid for results not labeled as ads. The ultimate result of all good things when they get too big? Corporatize? I fear more that the people who should be creating the next google have been numbed and hypnotized into a state of flux, where everything is acceptable.


You fail CAPTCHAs and get flagged as a robot. Are you sure you're not a robot?

I have the same problem.

To pass Google CAPTCHA you have to perform like an average human. This is different from getting it right. Once I started being lazier, (e.g. back of a sign isn't a sign, picture that contains a storefront but in the distance doesn't contain a storefront), I have had greater success rates.

It's like Google is training me to be dumber.


An image containing the back of the sign doesn't contain a sign. A sign is the front. The back is just a flat surface, not a sign.


But it's part of a sign and all things that are part of a sign, as long as thry are attached to the sign, are signs. ;)


But it's part of a sign and all things that are part of a sign, are signs. ;)


I've noticed the same and if you look across internet and search properties it's trending the same way.

I'm not certain yet, but I think it has to do with systems relying more and more on the the system giving a high weighting on popular or frequent searches. In other words, these systems are filtering content by how frequent they are searched and annealing returns which have low frequency.

This makes sense from a machine learning perspective. If I want to build a system which returns a search quickly, then it will be biased towards pathways with strong weights. So in the end, systems would be biased against outlier searches and very specific terms which have low or weak search pathways. Effectively the system is getting really fast and good at giving you the most popular return, rather than a precise return.

The major problem there is that over time the search space will atrophy, much like memories do, and will kill off pathways which have low frequency. It's unclear if this is good or bad long term for a search engine, because it will remain popular for the majority of users, so long as their search terms and desired results live within the same space. In other words, we're creating a less diverse and more homogenous space by virtue of giving higher weights to more common thoughts/searches/desires.


I think the interesting question is: if you optimize the design to return rarely-accessed results reliably, what feature that we take for granted does that design sacrifice?


To me it's obviously speed. Reducing the search space or cache or however you want to define it is the best way to increase speed.

When milliseconds count that's important.


This is sad because humans are much much slower than computers and even 1 whole second is nothing to us. At least I am quite willing for Google to take a second or more to give me more thorough results. A query that takes 900 milliseconds is better than the same one taking 150 milliseconds but returning poorer results. The additional 750 milliseconds are virtually unnoticeable in human time and a small sacrifice to make for excellent search coverage.

The only place where every millisecond may count is the case of automated queries running into the thousands and millions. I'm not aware Google even allows something like that and it's a corner case anyway. Human typed search is the majority use case.


2 seconds is a human patience threshold. Google aims for 500ms (750ms is very noticeable).

https://www.webdesignerdepot.com/2016/02/how-slow-is-too-slo...


Presumably if Google spend 150ms of processing time typically, they don't want to spend 900ms (I'm ignoring transit times, etc.) on one query as they could get 6x as many ad impressions for that processing cost.

They don't even need to do better than the competition, only well enough to stop customers leaving despite having to do 4-5 searches.


> will try to rewrite your queries and omit words (including the very word that makes all the difference --- I didn't put it in the search query for nothing!)

I've always double quoted the specific term I'm looking for. That usually bypasses this, ie. searching for "foo" will look specifically for foo and not food or what have you.


Those captchas are the worst. Google asks me to answer one whenever I make search queries in the middle of the night.


The wierd thing is: Google created two captchas. One to digitize books and a shockingly awful one they use for their own services. Seriously the one they use for their own services is complete garbage and obnoxious to even attempt to deal with


Aren't DDG and Yahoo using Bing?


DDG is now also using Yandex. Not sure which one is better though, I find complex queries still better answered at google (luckily that's quick via !g)


What is the modern search engine that represents best what Google did at its functional peak?


The timing of this correlates with the sharp rise and unavailability of DDR4 memory...?

Coincidence? No...


very superficial information, will try to rewrite your queries and omit words (including the very word that makes all the difference --- I didn't put it in the search query for nothing!)

This is the algorithm deciding that it’s way results in more ad revenue for Google. What do you do? You repeat the query again and they show you a second lot of ads.

But let’s be honest if Google could show you as many ads without returning any search results at all, they’d do that in a heartbeat.


I believe it's not that direct. It's optimising for what most people search for and how to keep them using Google most. That in turn leads to more ad revenue.


Ads? What ads? I see no ads.


I actually forgot there's ads in Google's result pages. It's been ages since I've last seen them, uBlock is the first thing extension I install on a fresh Firefox.


Yes, one tends to forget about ads ;)


Ah yes, forgetting... I feel we should maybe commemorate the atrocities of the past in some way... Like a National Remembrance Day for the Fallen Ads, we could have a moment of two minutes of ... loudly over-compressed slick fast talking sales pitches trying to make you feel bad about your life without their product.

Oh, the nostalgia :) Actually thinking about it, I can get nostalgic and enjoy viewing certain ads sometimes, provided that they're over 10 years old. It's that whole icky feeling of being manipulated even while believing you know they do it (except they do it worse) these grabby greasy fingers reaching into my brain. After a about a decade, ads lose that power and they just become quaint. Like the "Jazz Solo Design" from the 90s (image search it and you'll recognize it immediately).


I've convinced myself that this happens in gmail / hangouts history search too. It'll very confidently tell you that here are the only six results for your search term going back to the beginning of time, but if you go and manually dig up something that you know is there from ten years ago, then all of a sudden there are seven results the next time you search for the same term.

I haven't done this methodically, and I can't prove that this is happening, but it's infuriating nonetheless.


This definitely happens. I have 1 email that is about 5 years old that I reference once or twice a year, often enough that it is a suggested search term. Recently Gmail has been unable to find it and I restored to starring it. It is literally the only starred email I have but I can no longer search for it.


This is actually very worrisome for me. I use Gmail as my personal store of weird information, from Wifi passwords to the account number of that service I only use every 4 years. I Just send myself an email with obvious terms to search for in it and the relevant information. It's super convenient and I've never had it fail... yet, apparently.


Has happened to me too. Actually, the Android gmail app is even worse on searching. I often have to launch a browser and search via the web interface because the app returns no results.


My experience with the Android App is that search seems to be local only. So if the email is older than the retention policy or handled elsewhere then it's unlikely to show up there. Emails deleted on the desktop are notoriously invisible on mobile.


This is true in their iOS app as well. I've given up searching in the app and just open up the full web site when I need to find something.


I've had this for Chrome history as well. There have been multiple times where I'm sure I've browsed a site with some keyword in the title and it just doesn't show up in search. I don't tend to have a clue about the time window it would be in either so I can't go looking for it, so I can't prove it.


Chrome history may as well be useless and/or developed by the same people who created Reddit search.


I feel that it is intentionally bad so people don't realize how much Google knows about them.


Meanwhile their image recognition gets better and better. For those of you who use Google Photos backup, try a keyword image search in Google Drive sometime of your untagged photos ("beach", "face," etc.) You'll be creepily surprised on what Google is indexing, even against what they claim they don't (try some sketchier words).


I can one up that: I was living in Dubai a few years ago and have a number of photos of fancy cars I could never even dream of affording. If I search for “Lamborghini” or “Rolls Royce” it gives me the photos of those cars. I’ve never tagged them and I’m not an Android user, so they aren’t reading my messages.

https://imgur.com/caz8D2Y

I even have four photos I took at night, in burst mode, as a Bughatti Veyron zoomed passed, and yes, it can recognise those...


That's not a sneaky feature; it was the primary selling point when Google launched the new Photos product in 2015.


All modern gallery apps have some rudimentary photo recognition, but I haven't found any but Google's that will allow you to search terms like "topless/nude" and find accurate results. I would never store photos like that with Google, but I've confirmed it works, and with how many promotions Google has run offering free photo storage with their latest phones there are undoubtedly thousands and thousands of unwitting users who have sensitive photos not just automatically backed up in some Google server, but categorized. Just imagine if these servers were to be hacked and that information was conveniently pre-arranged for extortion.


I understand image recognition searching for generic terms like "cars", but my point is this can even recognise brands. And it only returns photos matching that brand, so it isn't replacing "Rolls Royce" with "cars" to do the search.

I guess it makes sense to do this, given that most things they do is based selling ads, I'm just surprised it is this accurate.


iOS photos app does this as well.


It does generic terms like "cars", but it doesn't work for specific brands (at least on my 5S).


Samsung gallery app does this as well. You can even search for stuff like "selfie".


Maybe every photo using the front camera is a selfie?


Didn't think of that obvious solution!


Maybe, but it's also incredibly poor at the same time. When I search my photos for "dog" I get many many pictures of cats. But that's sort of understandable, since they are both 4-legged animals, right? Well, then I don't know why searching for "dog" also brings up pictures of birds I have in my google photos. It's great about 90% of the time, and the remaining 10% it's hilariously and completely wrong.


I searched for "paper" today and one of the results was a toilet :P But yes I agree, it has been getting better.


Was there toilet paper in the picture? :P


Just had a look, turns out there was haha. Guess that's why then.


Where has Google claimed they don't index certain terms?


Chrome is much better than Edge in this regard. I have a website bookmarked and tagged with a very rare specific word. In edge, typing the keyword will never show up my bookmark. Instead it shows crap from around the world. Firefox and chrome do the right thing, my bookmark is the first suggestion.


The address Bar in Firefox is great for this. I fire off a fraction of the searches I used to after switching.


I remember not too long ago a colleague of mine gave me address to internal monitoring system that we use. I tried to find it few minutes later in chrome omnibar by typing almost exact url i.e. page was aaa.bbb.com so I typed aaa bbb. It gave me nothing, just search suggestions. I'm guessing chrome does it on purpose. The less browser history they search in chrome and show back to you the more you'll go to google web search and that's ad money for them.

That's why I'm back on Firefox after quantum release. I hope mozilla never, ever, ever does something like this but I remember seeing something similar on nightly once. It gave you search suggestions first, that redirected you to google, with option to disable it in settings.


I think Chrome history might be limited in time. Something like 3


Something like 3 months. I forgot a word apparently.


I'm just glad I'm not the only one!


I have experienced this and it is one of the main reasons I still keep an imap client set up. There are certain emails I need to be able to find and gmail does not find them. Claws Mail does. Personally this is a minor annoyance with personal email, but I would hope their commercial offering does not have this behaviour.

Another really interesting thing I've noticed in gmail relating to search is that the number of matches for a given search is approximate, which makes perfect sense if they're using some kind of probabilistic data structure. However, when the correct number of matching emails does become known, because you have gone to the end, the result is not cached even client side. This gives a weird effect when combined with pagination: you go back a page, and the number of matches changes to the estimate again despite the fact the actual number is now known.


Startup idea: a service that will let you search your inbox. Aka google for searching.

Seriously, this is egregious. You rely on your email provider to accurately search your inbox - some emails are important business, tax, and legal documents that are relevant for years, even decades. Or at least be fucking transparent about the fact that you are not really searching all emails. I know Gmail is a free service and in the T&C you agreed to (figuratively) sell your soul but this has huge real-life implications.


> Startup idea: a service that will let you search your inbox. Aka google for searching.

I don't mean to pick on you but picturing the perspective behind this comment is very funny and a little sad to me.

grep is almost 40 years old. It is free software, fast, and doesn't share your data with anyone. Small knowledge of the file structure of MIME enables more advanced search. This is all without mentioning desktop-based email clients.

Reading your comment, I can only picture some web-page javascript-based track-you-and-show-ads 15-employee company whom you give your email password so they can connect to another service and make high-latency queries on your behalf.

Shows how far we've come?


Firstly, that startup idea was an overt irony aimed at Google :). Secondly, my guess would be that maybe 0.001% of gmail users are familiar with command line interface and regular expressions. Not everybody is a coder and that is not necessarily bad.


Apple has the opposite problem. On my version of OSX, Spotlight searches in the Finder return email results too, which increases the noise-to-signal results dramatically. You can turn it off but this requires you to enter your search term as a formula every time--there's no way to make it the default. If I want to search my email, I'll switch to the Mail program and search there. I neither want nor need to search my mail in the Finder.


> I neither want nor need to search my mail in the Finder.

Just exclude Mail from Spotlight search in Settings.


Gmail is not just a free service; g-suite costs money, and if this affects corporate email this definitely has real world implications


If it does not affect corporate email, it does not have real world implications?


Not necessarily, he didn't say "if and only if"


Thunderbird (and I'm guessing most of the offline, true email clients) has this built-in.


It's funny, in discussions about webmail vs local email client, people often say "why would I want a local client anymore, webmail is all I need".

Well... this is why you might want it. Your data under your control. Your choice of tools.

If you're using GMail and Google decides to turn GMail to crap, well, bad luck.


This has happened to me with labels before, too. I'll do a search for all things that have a label and are in my inbox, and then archive them. I'll then go back to my inbox, and see that it missed something with that label. If I then repeat the search, I get zero results, even though it has the label, is in my inbox, and I can go back and find it. It's extremely frustrating.


100% Agreed. Gmail doesn't seem to want to find stuff that is there.


Gmail also drops sent mails occasionally for no apparent reason. They don't even bounce back, they simply disappear and never reach their destination.


I doubt it's malicious


I doubt it meets expectations


I think it meets profits.


I find it incredible that google can profit from Gmail. (Actually I find it incredible that Google can profit at all.)


Advertising is an incredibly profitable industry.


Gmail doesn't show ads anymore.


It provides a pretty good incentive to keep that Google cookie in your browser though (as well as your access to your interests, online purchase history, address book etc. etc. ad nauseam)


It does in the promotions tab.


I’ve gone back to using a local client recently, and being able to search/grep text files I know are in a dir has made me feel much less like I’m going insane / dependent upon capricious mystical forces.


Same here. Offline search in Thunderbird beats everything, particularly the quick filter feature.


Microsoft does this too. I have a massive mailbox going back 15 years and O365 doesn’t handle it well with full text search... you need to scope it to a person.


Ah this is good to know. I current run on-prem Exchange, with a view to moving to Office365, my mailbox is a super-set of all the mail I've ever had and goes back 20+ years.


My email arching ve going back more than 20 years is stored in a single .PST file, and I search it using Outlook. Never had a problem. Every 3 months I copy everything older than 2 months from Exchange Online into that .PST.


Outlook is totally different, and I agree it works perfectly.

I'm talking about OWA Search in O365. I use mostly VDI these days, so PSTs are out. It's a frustrating issue to me because OWA search is better in many ways for more recent stuff.

Also note this is an anecdotal interpretation based on my experience.


Just in case you are not aware, PST files can go horribly wrong when they get to 2GB in size.


Thank you for bringing this up. It would make my month if someone would chime in with a solution. I didn't know about this problem until very recently, and it caused some major headaches in my business.


Can confirm. Google, the search company, has lst the ability to accurately search even the data that it holds, never mind any other.


I've had this happen too. It's not just you. This has happened to me repeatedly and it is so, so frustrating.


I definitely have this exact issue with Google Calendar as well. I search for the exact wording of an event, and it doesn't show any results that are old. I then manually go back in the calendar and find it, do the search again and ta-da! it now shows up as a search result...


Sounds like they don't index the full corpus


Or there is a time bound or other resource bound that they are willing to expend under the current circumstances ( are you a free user? Paid user? Internal user? Mobile? Web? Etc )


Which isn't a bad thing as long as it is communicated. If Google is limiting functionality somehow, let me know if there is a pay to play to enable everything.


I like how Wolfram Alpha does it. You get a certain amount of compute time for free, and if you've got a paid subscription, you get a known amount more. It works well, and I don't mind paying a little to get a more reliable service.


yup. this.


Yep I have the same issue.


Hate to side with the big guys, but its a free service. Beggars can't be choosers. They probably dump indexes after a while for content older than x. Seems fairly reasonable actually.


Then there's not much point to using Gmail. That was a huge part of the "never delete anything again" ploy.


> Then there's not much point to using Gmail.

Almost there...


You're paying with your data, same as with Facebook and many others. These extremely successful businesses are obviously able to make plenty of money using that data.


Maybe your data isn't worth what you think it is these days. Maybe they actively index what is recently used. It's cost benefit for them.


Google drank too much of their own koolaid. They were always seen as an "AI company", but they really weren't. As late as 2008, Google's head of AI said they used very little machine learning in search. And actually tried to avoid using it as much as possible. Because they found it very unreliable and it gave weird an unpredictable edge cases. Everything was painstakingly hand engineered.

But now we are at peak AI hype cycle. No wonder it's gone downhill. I'm sure the AI does better on whatever metrics they tell it to maximize. But AIs game the hell out of metrics. And we are still nowhere near human level intelligence. It doesn't understand your query or the content of the websites. All it has to go by is simple keyword matching and meta indicators like the size of the website.

So now basically all searches return the same handful of large websites. When was the last time you got a search that went to some niche forum? Or some small little homemade website by someone passionate about that specific subject? No, it's always a wikipedia link, followed by a bunch of contentless news articles. And there's never any point of going past page 1, because every other page is like that too.

And now the web has become this: https://www.ncta.com/sites/default/files/platform-images/wp-...


The little passionate website in the 2000s used to be the best source of information if properly vetted. Now it's only big websites regurgitating the same superficial content written by paid writers who have no expertise in the subject matter. They instead just copy other superficial sources.

I think this has dumbed down society because it certainly has dumbed down me. Also, complicated topics are dumbed down to 1 sentence answers and it's very difficult to get detailed information about something. Ironically, I've started to go back to purchasing and reading books if I really want to go in depth on a subject.


On the other hand, from my experience:

I wanted to see how Raspberry Pi assembly worked - quite a niche subject (who programs? In assembly? On a Raspberry Pi?).

Yet I found quite a few blogs going through the subject - a tutorial of sorts.

I was also looking up how to write an OS (just for fun) - another fairly niche field. Yet I found tons of tutorials in C, C++, and Rust.

I remember trying the same in the 1990s.

Nada.

You couldn't find nothing. I mean maybe you'll find a basic site with the source-dump of an OS (often without building instructions), but you want to dig into the meat (here is how you get from bootloader to x86-64 in 21 days, and why it works)? Foggetaboutit.

I definitely don't want to go back to pre-google web1.0.


In my experience, it depends a lot on how much competition there is from content farms. If your niche isn't targeted by those, it's easy to find the good blogs. Otherwise, it's very difficult.

I suppose that this is a use case where ML could help a lot to recognize "content farmed + SEO optimized" content. Hopefully Google could improve the situation in the future - supposing they are trying.


One of the most entertaining "bugs" that popped up recently while I was looking around for some opinion content that I would _hope_ is recent was that one of the content farm sites plagiarized a rather good blog post, sentence for sentence, for pages and pages. Instead of "I found that ..." they'd thinly-edited it to "Now, let us find..."

The annoying thing was that the content farm appeared _above_ the original content.


Probably because they employed better SEO than the original author :(


Search engines didn't exist until the very end of the 90s. The 00's is the golden age I compare too. And yes there are still some small websites. But they are a lot less common. Especially if you aren't searching for them intentionally. They've made sure you won't accidentally stumble across them. For instance, I just searched "raspberry pi" and most of the results are companies selling them. There's also the official site, official social media accounts, and the wikipedia page.

There was actually a result for a small wiki of pi enthusiasts. But it was buried 10 links down on page 2, and mixed in with all the other noise. There's also a lot of news sites and blogspam (a lot of links have a little indicator that says "x hours ago". As if good articles expire.)


>Search engines didn't exist until the very end of the 90s.

Both Yahoo and AltaVista were started in 95.


That is not related to the search engine itself, this is due to the amount of content available now compared to the 90s.


> I remember trying the same in the 1990s.

A major issue, that affected finding "Raspberry Pi assembly" in the 1990s was that the Raspberry Pi didn't exist yet.

A lesser issue, and perhaps one that may have made it difficult to figure out how to "write an OS (just for fun)" is that few people had done this and published tutorial-style material for absolute beginners on the web- and most of those tutorials that did exist weren't on the world wide web (but on e.g. usenet).


> And there's never any point of going past page 1, because every other page is like that too.

To be honest, I kind of like that part. Most of the time, if there are no useful links on the first page, I know to refine my search terms right away, instead of wasting my time clicking through pages and pages of useless stuff.


Many people here are pining for the old days of Google search where if you knew the operators and some tricks you could develop very specific queries and get a page that matches exactly.

I remember those days too. It did feel like Google search skill was a super power. I also remember my non technical friends and family being pretty much unable to find what they were looking for online. Google did not turn their rambling approximate queries into the results they want.

Now I find that although the precision of the old style is gone, Google is incredible at guessing what you want. Back in the day I might find the page I was looking for on the twelfth 'o' of 'Goooooooogle'. Now I rarely have to venture outside the top 3 results. I find that my family no longer needs my help to fast craft a search query.

Isn't it possible that Google just made the choices that are better for the average user and left some of us advanced users out? If your response to that is "they could leave me an advanced mode", consider how much work it would be to maintain to serve a tiny customer subset.

I think if you want a search engine for power users you want DDG

Disclaimer: I work at Google but not on Search or anything close.


> Google is incredible at guessing what you want.

More like guessing what most people want.

If you want something specific, sometimes you re out of luck.


I think you forgot how maintains computers of these non-powered users, it's us. Advanced users. When I decide Bing or DDG is better than Google Search, then Google can kiss my ass goodbye and my 20 computers I maintain occasionally for family and friends.


Why would you take your family to DDG? Not all users need to use the same toolset. If the parent comment is correct, and Google is targeting the "mass market" user, then wouldn't leaving your presumably less-tech friends and family with Google as their default search engine make your life easier? Just because Google is no longer the best fit for you doesn't mean DDG is the best fit for them.


I'm wondering if rackless Ruth, Google's bean-counter-in-chief, is behind all these. At Bing, bean counters often had calculated that if you cut down index to half (after certain size), you reduce half of the cost but don't lose half of the revenues. So there is a sweet spot where you can maximize revenue if you are willing to let go few demanding customers. When quarterly results needs a little push, everything is a fair game. I bet Sundar Pitchai doesn't want to look bad as "CEO" who can't meet analyst expectations.

I definitely miss old days of Google. After Amit Singhal left, things haven't been same at all. He had resisted unexplainable AI getting in to search features. But as he left, RankBrain had Ai-driven feature that is 3rd most significant. That feature is the reason why you often see pages even if they don't contain keywords and even a phrase you had specified. Those old guard knew that trading explainibility and little bit of revenue with slight decrease in customer sat wasn't worth it.


I suspect more that Google's focus on being the best general purpose search engine for the general public and its focus on personalization, predicitive search, etc., is behind it.

It takes a different design of index and algorithms to do what Google has openly had as a goal since well before Porat came on board than it does to be the kind of highly literal search engine Google started as.

There's a market for both, and the market for a well-designed, comprehensive, highly literal engine probably pays a lot more per user than the kind Google is focussed on. But it also is much narrower if there is a Google around. (It's also not clear that the web is the highest-value corpus for such an engine; the really commercially successful ones are more specialized and have large, human curated and annotated datasets is specialized domains, notably law.)


I bet Sundar Pitchai doesn't want to look bad as "CEO" who can't meet analyst expectations.

Whenever I see horrible search results, I often wonder if those people who work at Google, and surely use their own search engine as much as anyone else, have noticed the degradation and what they think of it --- especially those who are in charge of or even working on the search engine themselves. Does Sundar search, get horrible results, and think "Why is my search engine half-broken? This is my company's most prominent product, and it's not working as well as it used to." No doubt it affects all their engineers too, the ones who will tend to be looking for the most obscure things.

I think your point about metrics and revenue is very true --- they are numbers that can be easily compared, while the quality of search results is not (and also subject to a lot of different conditions); being a "data driven" company, they obviously place much emphasis on the former, ignoring the negative but not easily quantifiable effect on search quality.

As the catchy saying goes, "Not everything that counts can be counted, and not everything that can be counted, counts." Unfortunately a lot of Google's management don't seem to believe in it.


At his level (any C-Exec really) I'd say he probably doesn't notice because it'll be his assistant (or assistants) doing most of the searching...


Did you really intend to call her "rackless Ruth" or was the slur an accidental misspelling of reckless? Because pointless and crude name-calling like that really detracts from any valid points you might have brought up.


Not OP, but I'd assumed it was referring to removal of literal data centre racks.


According to Merriam-Webster [1] "rackless" is a variant of "reckless".

[1] https://www.merriam-webster.com/dictionary/rackless


Do you consider it likely that the poster I replied to routinely uses the dialectal variant rackless over the common spelling reckless?


You should also consider which areas of the world use "rack" as a slang for "breasts". (I'm not disagreeing with you - as a native English speaker I didn't know that rackless and reckless were synonyms).


It didn't detract anything for me. I'm able to read, comprehend, and digest any wisdom even if it's written with language which I wouldn't use myself. I think it's a really useful skill because you'll find most people do use different language to you.


My problem with the comment was not the dirty words (I love me some George Carlin and Louis Black), but the act of name-calling. The very first sentence of the hacker news guidelines for comments is "Be civil" [1].

[1] https://news.ycombinator.com/newsguidelines.html


That kind of "name calling" is pretty common in many areas. Watch some debates in British parliament, for example. Do you really think the Chief Financial Officer of one of the biggest companies in the world is unable to handle it? All you will do if you pursue this sterilisation of language is drive away the really intelligent, if idiosyncratic, people and help build an echo chamber.


I sincerely doubt that the Chief Financial Officer of one of the biggest companies in the world has time to read random comments on HN, so whether they 'can handle it' is moot. There is certainly a place for cursing and maybe even name-calling (though I'm surprised you'd bring up the British parliament as an example worthy of imitation) but Hacker News is not it, cf. why I quoted its guidelines. Indeed, sometimes cursing is the only way to properly convey a meaning. But this was not such a case - the slur was entirely drive-by and only served to distract from the rest of the comment (whose merit I cannot comment on since I had never heard of the person being commented on before today). So my intention is not to sterilize language or build an echo chamber, but to hopefully make one of these 'intelligent, if idiosyncratic' people aware that their way of communicating was preventing their message from reaching a wider audience.


The only one detracting from anything is you. The rest of us read the comment just fine. I'm american myself and know the "rack" slang but the meaning you suggested didn't occur to me until you pointed it out. I assumed the name had something to do with where this person (I don't know who he's talking about) used to work or some infamous project that went badly or something. Understanding the name didn't seem critical to the comment so I didn't spend time trying to analyze it.


No, not all of us. I didn't read the comment just fine either. I assumed a person who uses irrelevant misogynist slurs may have a personal grudge against the person he's insulting, so I'm not going to take anything he says about her very seriously.


Wait, misogynistic? You think "rackless" was a comment about her body? If it was that would be strange and innapropriate. I didn't read it that way and it didn't occur to me before this comment. I'm willing to give the benefit of the doubt that it wasn't meant to be that because it would be so unexpected.


If this wasn't OP's intention, I deeply apologize. That was my understanding and I think it was m12k's too. I'm used to seeing civil discussion here and that's what made me dismissive of the original comment.


> If it was that would be strange and innapropriate

Yes, that was my thought too, hence why I asked them if that was indeed what they meant by it. An archaic/dialectic spelling of reckless is strange too though, so I'm not really sure which is more likely. Just out of curiosity - if it only just now occurred to you that 'rackless' could refer to her body, then what did you think I was calling the comment out for?


I thought you were one of those overly sensitive types that got offended easily. You never know these days.

It's funny because it says a lot about some people who immediately thought it was a comment about breasts rather than either a typo, alternatively spelling or rack referring to data centre technology (on a technology site).


> irrelevant misogynist slurs

Has the OP confirmed that this is what the phrase meant? Because otherwise you're projecting meaning on something that may not have been intended (again, I have the back ground to recognize this kind of slur but I didn't see it in the original comment until it was pointed out). Why would you do that?


Interesting to note that even this post apparently is not indexed in Google: Do the search "site:https://news.ycombinator.com "rackless ruth"" and find 0.


Indexing the entire web isn't instantaneous, you know.


Hmm. I just tried to reproduce this with old posts of my own and couldn't. I picked random phrases from five early 2006 blog posts that get basically no traffic and searched for them:

    "I had been playing the accordion Davy lent to
    Rosie during winter break"

    "The language they're using is not that different
    from the one I wrote PlayGUI to use"

    "I've been playing a decent amount of music lately,
     mostly guitar and piano."

    "warm dry socks was the most important aspect of the
    festival"

    "This wouldn't be that bad, if it was not exactly what
    happened a year and a half ago."
Google found all three. For each one there were either 2 or 3 results: first my old post, then one or two from rssing.com which seems to do something with my rss feed.

Trying them with Bing, it also found all five of my posts, and ranked them first in four cases. In the fifth case ("This wouldn't be that bad, if it was not exactly what happened a year and a half ago.") it ranked a goodhousekeeping.com post higher, which had all the individual words but none of the phrases.

(Disclosure: I work for Google, though not in search.)


Actually, rssing.com has nothing (directly) to do with your RSS feed. It seems to be a content scraper and mirroring/archiving "service," if I'm feeling charitable. And it looks like a site that wants to redirect users to its copy of users' content in order to get ad revenue if I'm feeling less-than-charitable.

In fairness to Google, that's always been a problem, and even in the good-old-days in which keyword-based searches were more effective, there were content aggregators that would copy the entire contents of phpBB-style bulletin boards (and USENET newsgroups) in order to rehost them and get clicks.

On the one hand, I want to say that it's precisely the sort of SEO/spammy practice that Google should be deprioritizing in search results. On the other hand, sometimes these copies/mirrors of content are the only extant copies of content when an original blog goes away. Although the motivations of the owners of these sorts of sites may not be as pure as that of archive.org, the result for the searcher is equivalent: the desired information is found even if it's only a rehosted copy.



>Google found all three.

All five? Or 3/5?


Sorry, all five. I wrote the comment with three, then thought more data would be better and tried two more.


[Tim here] Folks might notice that the article is once again find-able. Being 2 links from the top of Hacker News will do that…


Or the dataset outage was temporary. Remember, it's a distributed data corpus.


This needs more attention. The rarest pages are probably not very redundantly stored.

Sometimes the datacenter your search query lands in might not have a copy of the necessary page. Now they have to decide if they delay the entire search query to remotely query another datacenter, or not. I would guess returning the results early is nearly always more important than returning a result which is so rare it has never been clicked in the past decade.


I would not be surprised if google still has the data. Not sure how google handles things internally. However, google needs to pull up the results fast. So they might have 4 billion results with the word "water" in it. They only make tiny portion of that available. So if I type the words "Hot water" google it looks at the subset of pages with words "Hot" and the word "Water" So google must pull the pages that have both words quickly. So the number pages in these subsets "Water" and "Hot" must be small enough to quickly be merged/intersected. There are other things that could be done to speed it up, but I think you get the main idea.

However, what I am getting at with that simple example is for the searches to be quick google keeps these lists small. So there is a limited space due to time-constraints. So google must decide what is relevant for the available portion of their index.

However, that does not explain why other search engines don't have trouble with older sites/links. I suspect it's more of business decision than a technical one.


Intersections are another thing that Google search doesn't do properly anymore. If I search for something like lkasdfjer samsung galaxy s8 it just gives me matches for samsung galaxy s8 and ignores the first word. When I do searches like this, I do it for a reason and don't want matches that lack some of the search terms.


I've found if I put the keyword in double-quotes then it makes the keyword required in the search


Not even this is sufficient any more. They now have a "verbatim" search, but I think even then some terms can be ignored -- terms which are not conventional "stopwords" like the.


Yes, verbatim is distinctly broken sometimes.

edit: It's not tedious for me on my browser, just click Verbatim on the LHS of page. (Can select that or All results)


I don't see that on the LHS. It would be nice if there's a link to it, something like https://www.google.com.au/search?verbatim=true that I could bookmark. Edit: or somehow set as the browser's default search engine.



Nice. Did you discover that (gleaned from url) or is it some documentation?


From the url, I don't think they have a documentation :)


Maybe "verbatim" is the same as putting quotes around every word. The verbatim search seems a bit tedious to activate: search for something, press "Tools", then "All results" and pick "Verbatim" in the drop-down. Although once activated, it stays activated for subsequent searches.


That helps. Even if you go to the advanced search, https://www.google.com/advanced_search, and use the box "all these words", you need to quote the words for it to take you seriously. I didn't give a good example, since there are no matches for that search (except reflections back to this discussion), but in other cases there are legitimate results that only appear after pages of invalid results.


Testing your hypothesis:

https://imgur.com/a/yvo3y


I think that's a good result, yea? If so, I'm glad it helped! Typically I use the double quotes to search for specific code/error strings and then use non-quoted words in the query to help filter the results to specific context, like the app name or topic. Others suggested the "Verbatim" option.


It's probably architected to err on the side of giving you something over giving you nothing, because the common use-case for <not-in-index>+hit+hit+hit is "not-in-index term is a typo," not "not-in-index term is an intentionally-crafted attempt to zero the results."


double quote each term/phrase and intersect with + sign. this tends to work.

"lkasdfjer" + "samsung galaxy s8" : brings up this discussion


Seems reasonable when you put that way, but as you said you are unsure how google handles their indexes. I doubt we will ever will unless your signature is on a NDA.


Google does not need to pull up ALL results fast. It only needs to return 10 results quickly.

That's not relevant to the article, which says that the results are not available AT ALL. (Although as of my posting the two articles seem to be available again.)


True, but google is not searching there entire index for those. A simple linear search takes N time. So for a word that occurs billions of times. Google is not going to go through that entire list. They might use some clever hashing to jump around, and sorting. However, when trying to intersect two keywords they either have to pre-generate the intersection or make the data set they are intersecting small enough to get those 10 results quickly.


I wouldn't be surprised if this issue is as mysterious to Google staff as it is to you.

Google Search no longer runs a clearly defined algorithm to find search results. It is a collection of AI systems that are trained continuously on a variety of data. There is probably no human alive who fully understands how Google makes decisions about which results to return and how to rank them. They just understand how to provide feedback to adjust results they don't like.

If the system gets rewarded for finding common things quickly, then it will adjust its internal algorithms to make that happen--perhaps even if that means dropping unpopular results altogether.


If that's the case then this seems like a symptom of a problem I've been thinking about for a while...is it safe to build systems where it's impossible for someone to understand how it works? What happens when it breaks down? Who will be responsible?


You just described human society in the large, and the answer appears to be "We muddle on."


I think this phenomenon has become a lot more important in the last few decades...I think it was fully possible for one person to understand how a car worked, how household appliances worked, etc, until quite recently


Probably depends on how you slice the question / define "fully understand." "How it works" in terms of being able to tell yourself a story that makes sense? Sure. Car burns gas, gas makes engine go, engine turns wheels. But what about the details you'd need to actually make a car (which include but are not limited to thermodynamics and metallurgy)? Probably not.


One difference is that there are parts of the system that, most likely, not a single person understands nor, even with weeks of effort, could understand. At least with cars, you can be reasonably sure somebody understands or can reverse engineer any specific component. With certain software that is a lot less possible.

It would be like if our society couldn’t fully function without Roman concrete, but no one knew how to make it anymore. Which is sort of what happened during medieval times: people continued to use Roman roads and Roman bridges. But if anything degraded, there was no hope of repairing it.


That's unfortunate, because their simpler system worked better.


I don't think so. A few years ago, Google worked much better when you knew how to phrase queries. I often helped family members to find something online, just by rephrasing their original query.

Today, it doesn't matter how you build your query, Google returns good results in any case. That also means that you can't search for specific info by phrasing queries differently but for the vast majority of people it makes life much easier.


You have a good point here. However, it would be great if both directions would be maintained. One, with "good enough" results for the common crowd, and another one, with a scalpel sharp ability to dissect it.


If you can do that at Google's scale I'm sure they'd be happy to hear your proposal. I really doubt Google's engineer wanted to give up the precision (they'll use it for their daily job as well), I just think they couldn't justify the additional resources.


Don't make it sound like a lack of ability. They could keep giving time slices to the old code if they wanted to. And it wouldn't that that many resources, "justified" or not.


Keeping old code running within Google is surprisingly hard...

Most big services (eg. Google Docs) have entire teams of people who run them (SRE's). Without those people to keep the system going, it would probably break within a few weeks.


I suspect this might be a case of “it is worse in my specific use case, but better for the overall user base”.


I think it's more of a "we could support both cases, but the ROI on supporting the power user is not worth it".


We will see more frequently and in more areas since AI has become so hot in MBA buzzword bingo. It's fashionable, so businesses want, or even need, to say "yes we use AI."

Even Google falls into the trap that a worse solution with more fashionable tech gets deployed.


> They’ve nev­er claimed to in­dex ev­ery word on ev­ery page.

Not in those words, but they do claim to aspire to “Organize the world’s information and make it universally accessible and useful.”[1] which ought to include old web pages. They've gone to the effort of finding out of print books and digitizing them to make those searchable so it doesn't seem like a ten year old web page should be such a stretch.

[1] https://www.google.com/intl/en/about/our-company/


you'd think it would at least come up in the internet archive if not anywhere else.



That's unfortunate. But understandable in a way.

    # robots.txt web.archive.org 2013-10-02

    User-agent: *
    Disallow: /

    User-agent: ia_archiver
    Allow: /


touche, I don't suppose the old non commercial websites mentioned in the article suffer the same problem though right? Maybe an accidental robots.txt file was mistakenly left around?


So from a busi­ness point of view, it’s hard to make a case for Google in­dex­ing ev­ery­thing, no mat­ter how old and how ob­scure.

I don't get that line of thought, somehow people have starting defend lack of quality as something expected or reasonable.

The whole point of going to google is to find stuff, that includes "boring" old and obscure stuff that won't sell ads. But that is part of the deal, if google don't care about that why should I care about google?

While we are on the subject, I still miss being able add a + in front of a word to highlight its significance, this was removed in favor of google+ (you can still do it with quotes) but now when google+ has been irrelevant for the better part of a decade maybe it is time let us quickly emphasis words? Sigh.


Google's job is to make money, and to grow and make ever increasing amounts of money, not to find stuff. If Google decides that it's worth it to lose some of its customers in order to reduce costs, then it's fair game for them to do so.

Maybe a competitor will come in and steal marketshare from Google by filling the hole Google leaves behind when they increasingly make changes which annoy a subset of users (duckduckgo is probably closest, and I use it as my default search engine, but I don't find it's as good as Google yet in a lot of cases). Maybe enough people will switch to be an issue for Google, in which case Google miscalculated the cost of pissing of that subset of its userbase. Maybe so few people will switch that it's offset by the cost savings of not keeping everything indexed, in which case it might've been the correct decision, from a capitalistic point of view.

This focus on constant growth is the issue with relying on companies and capitalism, but that's a bigger discussion.


What makes you believe this is a rational decision? Even if you reduce everything to "must make money"?

This hasn't anything to do with capitalism. It is pure greed. Companies willingly do anything for a slight increase in revenue even if they willingly acknowledge that it will cost them ten times as much in the (not so) long term.

There is nothing about capitalism that says you must be a colossal idiot, that's just a consequence of a poisonous culture where employees don't give a crap about the company but only focus on their own career.


I've done essentially the opposite ... I use DuckDuckGo as my everyday search engine and only use Google when there's not an acceptable result there. Over the last couple of years, the number of times per week that I switch to Google has probably halved.


This works in my experience only good if you live in a English speaking country where Google is also big (oh the irony). I've tried to use DuckDuckGo but is bad, really bad for local searches in smaller and non English speaking countries.


This is basically my exact experience. Once a long time ago I made DDG my default browser and it stuck. I really like using “!g” to switch quickly to google.

If my search bar DDG search doesn’t return satisfactory results, I hit my address/search bar key (cmd+l for me), press left arrow/start-of-line key, type “!g “ and hit return. That gives me a google result page quicker than the time it would take to reach the mouse. It’s become a good workflow.

I’m having to use it less and less. DuckDuckGo is getting really good.


Tip: while in your case you still have to type something to get rid of the selection, you don't have to put !g at the beginning, anywhere in the query will do. I mostly put it at the end.


Instead of clicking you can use / or h to focus on the search field, press [right] arrow key and your preferred bang (!g) to open in google, no need to add any space to the last word (as you said).

https://duck.co/help/features/keyboard-shortcuts


I used to do just that, but then I found myself always appending !g to the queries.


that's been my experience too, sometimes I need the power of Google to infer the tenuous relationships my mind is able to regurgitate on a subject I can't recall precise terminology for. I cast a loose net of terms fumbling for something feeble I can hook on to dredge up from my mind to unveil the whole glorious thing I'm chasing.


I would bet this is one of those more subtle long-term effects that nobody really saw coming... when Google refocused search with an eye towards commercial results, I imagine it deprioritized a lot of the older, more innocent informational content lying around


This has been my experience. With Google I am constantly asking myself: Wait, what about all that glorious, smart, noncommercial web content that I know exists? Like the Stanford Philosophy Encyclopedia[2] or that economics professor's dataset that I remember being referenced in a podcast a year ago?

Google seems to have decided that Wikipedia is the only blessed noncommercial source of intelligence.

I guess, if I were to put it strongly, I'd say: using Google is not like using the Internet any longer.

FWIW, HNers may wish to check out Yewno[1], a knowledge search engine based in Redwood City that I've had the pleasure of being (tangentially) involved in.

[1] http://yewno.com

[2] Yes, I know this is indexed. It just frequently gets buried in my searches.


I feel this way with regard to the many, many forum posts containing the exact answers to questions I want to know. So much subject-specific, often hobbyist information seems to exist primarily on forums yet I almost never see forum posts turning up in search results these days. More typically I'll get (e.g.) five results at variations of the hp.com landing page or some other contentless nonsense.


I just noticed this also, earlier today. I have a blog entry titled "Highest airports in California" that I attempted to find using Google. Even with the double quotes, and restricting the search to the correct site, it doesn't seem to come up in search results.

The site's robots.txt seems permissive enough. What's up?

link: https://nibot.livejournal.com/1075122.html


This comment now comes up with the quotes search term, so your blog post will likely be indexed again in the next few days. If you submit a sitemap to google search console[1] you'll be able to see which pages have/n't been indexed. I'm not sure how else you can easily track indexing.

https://www.google.com/webmasters/tools/


I've had the same problem finding my old livejournal posts (which I certainly used to be able to google for).


DuckDuckGo returns that link as the third result, though.


I’m surprised this doesn’t exist as a Wikipedia page.


It seems possible these particular articles fell out of Google's index for some other reason than they are not indexing "old" stuff. That's kind of a big leap to make without many more examples.


We may never get to know these reasons. Moreover, from the point of view of the author they might be irrelevant. What matters is that Google is not indexing them, whereas the competition does.

(I agree looking for one's own articles is a specific case - in most other situations you'd want to know Google's reasons very badly.)


I still agree that it could be any other reason the one that led google de-index that website and therefore not indexing it again.

A good review would show more profs than just websites behind the same one domain.

Interesting though, let´s see if there´s more news on this.


There's not a single Google index; rather, it spans multiple tiers. Perhaps those pages fell through the cracks, in the figurative sense -- perhaps there is not enough capacity in the tier(s) they are in and the cutoff is too aggressive. There are internal tools to debug this, but of course nobody that has access to them will report here.

Uh, five minutes after I first tried, now the review is the first hit for [lou reed "rock n roll animal" tim bray] and a bunch of variations. Enough people searching for it might have changed the state of the system.


Q: What's the difference between amnesia and Alzheimer's?

A: I don't remember.


And by that I really mean it doesn't matter why they forgot. The fact that they forgot means there's a sickness.


I believe the sickness you're thinking of is "distributed data index."

Consistency, availability, tolerance: pick two (https://landing.google.com/sre/book/chapters/managing-critic...). If you think about the uptime constraints of www.google.com, you'll possibly conclude that the limits of distributed systems necessitate a solution where some data is transiently unavailable.


While we're all grumbling about Google, my main gripe is the increasing indexing of Pinterest in web search. It was bad enough a few years ago when they statted clogging up image searches, but I've noticed a lot of links to Pinterest now in the first few web results of a search.


Concurred. Pinterest is really bad. You need to create an account to do anything with it. Now I have to add -site:pinterest.com to most of my image searches.


The problem with many search algorithms appears to be that you profit massively if your content is on a highly ranked domain. Pinterest is very highly ranked so all content automatically also is assumed to be of high quality.

The same content hosted on another domain would appear much lower on Google than a Pinterest post (or a post on any other large website). Not sure if that's really the best approach but I guess it's not easy coming up with a better one.


I'm afraid the author has it wrong when he claims what Google cares about is "It cares about giv­ing you great an­swers to the ques­tions that mat­ter to you right now." It cares about making money via ad revenue. Even assuming that it gives you "great answers to questions that matter to you right now", it only cares about doing so if that results in making money via ad revenue. The result set is a means to an end, not the end itself. If more money can be made via ad revenue by providing some other result set, that is what it will become the preferred method.

Google is not a search business despite opinions to the contrary. A search business would have as its main source of revenue customers paying for either search results or the search technology itself. Google makes its money via the delivery of ads, through its own advertising sales and placement platforms and through the audience it can provide from its own content.


Yes, I have noticed this recently too, although I have no hard and fast examples.

Google _seems_ to be returning more popular/mainstream sites at the expense of less popular sites that may be more relevant.

Also- Google has stopped penalising sites that don't contain all of the search terms. It has always removed "stopwords" (words that are too common to be relevant: and, it, or, etc.) which is fine, but now it seems to remove significant terms as well. This makes a big difference if you are searching for programming stuff, particularly error messages.

From a pure search point of view Google is losing ground to its competitors, and that has _never_ happened before.


I have also noticed that google search results, especially in the last few months, are incredibly weird and jumbled, as if they so desperately want to show me the $current_chosen_web_winners of news/ecommerce that they sneak in results from them no matter what the terms.


This is definitely a thing. Many sites when they update take down old content that was getting few views. Many times that content is irretrievably lost.

This shows the value of actually grabbing content that you plan to use or hope to refer to in the future, rather than merely bookmarking it. And it also underscores the value of the Internet Archive.


> This shows the value of actually grabbing content that you plan to use or hope to refer to in the future, rather than merely bookmarking it. And it also underscores the value of the Internet Archive.

Yes. Everyone should install the Wayback Machine plugin and click the "save page now" whenever they find something useful or interesting:

Chrome: https://chrome.google.com/webstore/detail/wayback-machine/fp...

Firefox: https://addons.mozilla.org/en-US/firefox/addon/wayback-machi...

I hate hitting an unarchived dead-ends when doing research, so I'm trying to do my part to prevent it. Many page I've archived had never been archived before I saved them.


Very much so. Recently I was able to retrieve geolocation data for some Panoramio photos thanks to IA. Another thing I appreciate a lot is Web Archive (MHTML) as "Save As" file format in FF.


This explains a number of times I've been unable to find old articles/forums/what have you, even when fairly certain I recall most or all of their titles. This may finally be enough for me to move to DuckDuckGo, as the quantity of information published longer ago increases, and the information I may wish to reference becomes increasingly difficult to locate.


Anecdota, plural of data and all that, I know, but Google is able to find 15 years old posts from rather obscure french blogs.

https://encrypted.google.com/search?hl=fr&q=%22Dans%20mon%20...

https://encrypted.google.com/search?hl=fr&q=%22Eh%20bien%2C%...

BUT, interestingly enough, it can find posts for the very first months of the blog existence, but stays clueless for several posts dated a few years later. For instance, several exact strings from this page are not found by Google :

http://blog.smwhr.net/2003/10/


Google search team, I wouldn't want to be in your place today. My frustration with Google Search is it's degree of protection against bots. If you, Google, can spider everything webmasters post, why can't webmasters spider everything Google returns?

In my case, I would like to use Google search to bootstrap a few minor search engine indexes and collect data for NLP projects. But the free version is too limited and the paid version prohibitively expensive, so, no luck.


It’s exactly what we do: https://serpapi.com

Shoot me an email, I’ll hook you up with 1,000 credits. It’s funny one of my first use was from ML as well, but it end up being used mostly for SEO and web marketing.


> why can't webmasters spider everything Google returns?

I'm sure you could start client-side caching every search you ever do to Google.

But if you're searching enough to eat up Google's bandwidth, they're paying for that data and they're under no obligation to keep serving you as a client (much as any server is under no particular obligation to serve a search spider).


> But if you're searching enough to eat up Google's bandwidth, they're paying for that data and they're under no obligation to keep serving you as a client

Do you not see the irony?


Sites allow search engines to index them (instead of telling them to go away with robots.txt) because the search traffic is worth it to them.

Search engines don't allow people to scrape them (resorting to blocking after scrapers ignore robots.txt) because they don't get anything similarly valuable in return.

(Disclosure: I work for Google, though not on search.)


No, I don't. Can you help clarify it for me?

Search engines crawling millions of sites each with---on average---a few MB of data distributes cost globally.

Extracting terabytes of index data from a single search engine's repository consolidates the cost on the back of that repository's bandwidth provision.

These are not symmetrical cost structures.


Our git repository went down when crawlers decided to index it


But probably not Google. The google crawler is very careful and stops as soon as they encounter higher error rates. Bing appears to do the same.


I hadn't used Bing because Google whenever I would compare the results, it was no match, specially in long-tail results. This article has made me reconsider my assumption.

Usually GOOG had best results for technical queries --but fresh results for those tend to be the better results (things go out of date pretty quick because of quicker release cycles)

However, on occasion it has been difficult to find very specific results to non technical questions --I never even bothered with DDG or Bing, but now I will surely give them a try.

If they are data driven (and they are) for their average users, long tail, old results probably don't make sense --how many people really go down to page 20 of the SERP? I'm sure it's a very miniscule number.


how many people really make go down to page 20 of the SERP? I'm sure it's a very miniscule number.

Unfortunately those people are the ones who are searching the hardest for the most difficult-to-find things, and thus need the services of a search engine the most. It's unfortunate because, for every one of those, there's probably millions of others who just want to search "facebook" and click the first link; a site that I don't even use yet can recite the domain name of off the top of my head.


What's most disappointing is when Google tries to pull fresh content over "stale" content when you know that you exactly want the "stale" content.

This is a contrived example, but say you want to look something up regarding what someone said about drug overdoses back in the 80s. Google would try to insist on brining up information about the most recent overdose studies for example, because people are currently discussing that more, so to Google, obviously I should also be looking the fresher content, so the end result is I get less relevant to virtually irrelevant results.


I loved AltaVista. It didn't provide the cleverness of Google; you had to bring your own. I'd construct searches of the pattern:

(word OR word) AND (word NEAR word)

and get excellent results. Of course, the Web was much smaller then.


"[Up­date: Now you can, be­cause this piece went a lit­tle vi­ral. But you sure couldn’t ear­li­er in the day]"

And that's kind of the point, right? Beyond a certain threshold of popularity, some things aren't always available from every search query, because a distributed system can't have 100% uptime, consistency, and tolerance to network partitioning.


Are Google using a neural nets as an integral part of search indexing yet?

It's well known that there are a bunch of metrics that go into which results to return — metrics including things like pagerank, (probably) historical value (# of clicks when the page appears in results), and social media popularity.

I wouldn't be surprised if Google has experimented with training models to predict most of those metrics, given only content from the site itself, and tried using those models as a filter for what to index in the first place. If the NN is accurate enough, they can use it as a filter at indexing stage ("should I index this?") rather than at the results ranking stage (where real data, rather than NN model output, answers the question "should I show this page close enough to the top of results that someone will see it?").


From bits and pieces they've posted, it sounds like they use some all-encompassing glob of statistical inference that they call RankBrain, which almost certainly includes some deep-learning components. They've said that the old PageRank algorithm is now one input into RankBrain.


>My men­tal mod­el of the Web is as a per­ma­nen­t, long-lived store of humanity’s in­tel­lec­tu­al her­itage.

And that's where I don't agree with the author. I've been thinking about this and to me it seems we need to try to foster a culture of forgetting. Just because we can store everything forever, doesn't mean we should. That kind of thinking is exactly where this 'track everything everyone does'-mentality comes from, governments seem so keen to apply in the name of 'terror prevention'. It also values regular stuff you do way too highly.

Let's face it: A huge portion of the web is garbage and most information on it is ephemeral. The really useful stuff, like encylopedias, will be used regularly and thus keep indexed anyway. For the rest: Just let it go. Forgetting can also be relieving, you know?


I wrote about this when I dropped a lot of content from my site: https://petermolnar.net/making-things-private/


Personal info, sure. There's no reason to get rid of informational articles just because they're not popular enough to get into an encyclopedia.


I have some questions about information retrieval and SLOs:

* Is there a metric of search quality which is appropriate here -- specifically, "when I search for [site:tbray.org rock roll], and receive a set of results, that set includes Tim's article"? What do we call this metric? The metric would be lower when the result set is empty (no relevant results returned) and higher when the result set contains the desired article (a relevant result was returned).

* How would you assess the quality of this particular search against a metric?

* How would you measure the overall quality of "all searches in the past hour, including the [site:tbray.org rock roll] search"? How would this one failure to find a page contribute to an overall success rate?

* Is there any possible automation that would notice whether Tim's article has started to be missing from indexes and say "hey, this represents a loss of a kind of quality"?

* Suppose the index were to (say) discard all pages created before 1999 but simultaneously improve the relevance of all queries that find more recent results. If (say) 99.99% of queries have users happy getting only post-1999 links and (say) only 0.01% are unhappy because they specifically wanted a pre-1999 result, but things get way way better for the 99.99%, was that a bad change? would any metrics show a problem?

I don't see super satisfying answers to this at e.g. https://www.quora.com/How-does-Google-measure-the-quality-of... or https://www.quora.com/How-can-search-quality-be-measured . If I'm reading right, it sounds like part of the state of the art for search quality recently involved human raters manually running sample queries… That seems kinda crazy / totally unlikely to catch certain obscure issues. But then again:

* What is the service level objective for search quality? If search is getting way better for 99.99% of users because of various optimizations, is it a problem if a particular 0.01% of queries such as Tim's old review query, which he expected to find one specific page, instead find no results at all?

And then I guess I wonder:

* According to whatever metric correctly captures Tim's review being missing as a problem, what is the current search quality of Google web searches and how has it been changing over time?


This won’t answer all of questions but the measures you’re looking for are called ‘recall’ and ‘precision‘:

- recall: number of relevant documents retrieved / number of relevant documents

- precision: number of relevant documents in result set / number of documents in result set


Yeah you know, it's funny, the last time I worked on question-answering code, we were trying really hard to find algorithms that could improve a particular metric (F-score, a synthetic agglomeration of precision and recall) ... I don't remember hearing very many conversations at all about whether we were measuring the right thing.

Given a query like [site:tbray.org "rock n roll animal"], and knowing that the 1 relevant document we actually want is the review at https://www.tbray.org/ongoing/When/200x/2006/03/13/Rock-n-Ro... , I think we can say that

* if Google search returns 4 results for the query, not including the review: precision is 0/4, recall is 0/1 (so p=0, r=0)

* if Google search returns 5 results for that query, including the review: precision is 1/5, recall is 1/1 (so p=0.2, r=1)

But while I _kind of_ understand how we can use these measures to assess the outcome of a single query, I'm really not sure I understand what meaningful ways are available to aggregate those metrics. Suppose we're going to get 1M queries in the next hour. Do we prefer an algorithm which has the highest mean F-score per query? highest median F-score per question? or which has the highest 1st percentile F-score per question (99% of queries get the best possible outcomes?)

If there is published literature on how search quality is measured I'd love to see it. Would be especially interesting to see real-time data -- e.g. what is the impact of 1 data shard outage on overall user-experienced quality according to some metric?


"Modern Information Retrieval" by Baeza-Yates / Ribeireo Neto a few years ago used to be a good standard work.

I'm not sure though how well it's kept up in terms of aspects like real-time search and graph search, both of which are fairly recent developments.


In May of 1998, I published a review of Lou Reed's "Perfect Night" by Kevin McGowin, and when I google (in FF private mode, as if that helps) "lou reed kevin mcgowin" it comes up #1.

If I google "lou reed perfect night review", I stopped looking after page 17 of results. There's just too many results.

If I google /"lou reed" "perfect night" review/ with quotes as you see them, the review I published is on page 2, result #3.

I feel your pain, but, as someone who started publishing content in 1995, I don't see Google as having a memory problems.

My pain points are related to how Google crawls my content, my sitemaps, and how the two seem completely independent of each other.


Great tip about the intext: operator from @wahnfrieden. I'll admit I was unaware of this. Many of us could probably benefit from learning how to actually use Google more skillfully, instead of expecting the algorithm to spit out a 'perfect' answer every time. http://www.powersearchingwithgoogle.com intext: operator is covered in the Power Searching course, section 3.5! I share many of the same frustrations I've seen in other comments, but I'm going to work though these courses & hopefully become a better googler.


A few years ago they put up an old index people could, well, google in.

I wonder if they have any other index snapshots stashed away somewhere, I would love to do that again. Even if only to retrieve the urls of old homestead websites I had back then.



Why?


The site is very spotty to load for me. Is it just me?

EDIT: Yeah, the TLS handshake takes 20 seconds for me. I don't know why. Everything else works fine.


This post and some comments about the lack of predictability of today’s AI behaviors, makes me wonder: could this be the start of a new trend for startups ? « 100% predictable, AI-free product » ?


I think AI works best when you have the AI on one side, and something "dead" on the other, like a pattern. Games, images, etc...

Information nowadays isn't dead like it used to be, it's alive. It have desires and seeks readership, it mutates and wants to spread. This information has its source in human intelligence.

It seems possible it is outwitting Google's AI as the smart people have shifted focus from being smarter than "bad" information to building dumber than human AIs to improve bean metrics and bask in glory.

The core problem has mutated from "what is relevant" to "what is quality". Pagerank and the web of dead info could answer both with one number because the quality signal and the relevance signal were the same and hard to fake.

But if you can hyper optimize for relevance by making content addictive thus affecting downstream attitudes on "relevance", quality is no longer relevant. Current AI is smart, but still childlike and easily corrupted/manipulated. It's a black box that can't be inspected and adapts to change but because it's tempo of dynamic modeling is slower than a real humans it can be "trained" or hypnotized.

Hell, it's probably the core sociopath skill. Being able to manipulate the value/principles system of someone/something else.


Thank goodness people are pointing these issues out. I've been considering some blog posts to collect more info and hard examples of exactly this and similar google issues that are so often not talked about, especially by those who have ties to the big G.

There are indeed many keyword searches in which google is obviously censoring the internet in big chunks. Because they are so opaque no one knows if it's directed by a gov agency, or shareholders, or the vision of some guy at the top who thinks they need to grow up, or some small team that is so hellbent on destroying some things they do not worry about the collateral damage - or maybe it is "machine learning" figuring out if you censor big chunks here and there that people will spend more on ads. Who knows? Very few people, and who is affected? Many people, many who do not even know it. Facebook of all systems is changing thier over algo ways to be better for people and less for the algorythm - will things change with big G now that the last babysitter left?

Google's current path imho is becoming the next yellow pages. Sure they will do fine being embedded on so many mobile devices and being the go to place for crowd sourced directions and locations - but the yellow pages was king for a few days and then people realized it's value to the consumer and to the business was no longer what it was and now who is using yp? More details on a non-google run blog soon. Who can I trust not to give up ip info of posters to tech peeps at G? Wordpress.com? Oh wait, they use google fonts and all kinds of things don't they. Hopefully people can help put this info together with me, it is prime time to create some new alternatives that only do slices of what G used to do well.


Haven't used google's search engine in years. It was obvious to me back then that google products are designed primarily around googles's incentives, and the user second. SEO seems like it would be something great...if the "optimization" was in fact user-centric but it's not. I don't like being guided by algorithms and I don't really shop around for stuff online. When I search for something, it's usually for some very specific type of information. I mean I'm usually pretty sure about what exactly I'm looking for. Granted, most people probably don't use the web the same way I do so it's not like I expect them to do what they do differently. I just don't use google, and whenever I do tell someone to "google it", I'm being sarcastic and probably not in the mood to answer questions that can be answered with a few minutes at a keyboard.

Furthermore, when someone links me to an amp.whatever page, I might be in an even worse mood if I have to talk with them about it afterwards. imo, Google and facebook algorithms are half of what's causing most of the conflict and hate on the internet these days. The machines have literally taken over, and they've started with our minds. YouTube suggestions have devolved into nothing more than a rabbit hole that gets scarier the further you follow. Either they are completely out of touch with what people want or the mentality of society/humanity is way more fucked off than I'd ever imagined it could be. Until they provide actually useful and configurable settings for people who can think for themselves, I will not return to using these products.

Besides, how fair would it be for me to freeload off an ad company's services when my hosts file hasn't allowed an ad to load in my browser as long as I can remember?


Not later than yesterday, hearing about Dolores from the Cranberries, it reminded me of Amy Mac Donald (whose voice I find similar) hit "This is the life" from ten years ago. In this song it mentioned official lyrics "Talking about Robert Ragger and his 1 leg crew". Being of curious nature I searched for the reference ("Robert Ragger") to know who is this Robert Ragger (and who is this 1 leg crew) . Bing got it immediately. Google 1st result is a self referencing forum from 2009 telling there are plenty of result and you just have to google it : https://forum.wordreference.com/threads/talking-about-robert...

For those who don't do the search. (It's officially misspelled and instead referring Robert Riger, an art photographer, (hence the one leg crew :) ))


I imagine that people who work at Google would like to have a good search engine that works well for software engineers while at work and not one optimized for selling ads. Is there such a thing internally? I remember when exact quotes of error messages from software would usually return something helpful. Now almost never. Can I pay for that please.


Theory: Because google has local data-centers all over the world (that's why it's much faster than the competition, and google suggest works so well), indexes must be maintained at all of them. Because google has so many, this is a significant expense, and to keep costs down, they reduce the indexes to what is profitable.


Good theory, maybe that's why they push the "personalization" and localization of search engine so aggressively.

I personally don't feel that Google search quality has degraded very much if at all. It's true that I rarely get new sites outsite the echo chamber or few 1000 popular sites, but to me 99% of the time they give me relevant and useful information I am looking for. To be honest Google search results on average is still heads and shoulders ahead of the competition. in almost all aspects.


I think they push localisation because it makes sense for the user. In Europe, queries for at least 10 countries will usually be answered by the same data center and are still localised. The main reason why I still often use Google instead of DDG is to find localised content where you can't localise by language (e.g. finding UK specific issues).


Yeah, while most of the time modern Google has an awesome capability of understanding what you are searching for, sometimes it feels like a chat bot which didn't fully get what you said to it. It seems to understand some part of what you are asking for but miserably fails to answer the complete question (in some cases) :D


De-indexing old stuff might not be a good idea, but I'm increasingly running into the problem of google (and DDG) returning old and outdated results, I wish they would put more weight on recent articles, or at least add the option too. The time filtering options just aren't enough.


Why is the drop down in tools not enough?


Because I often don't know the range I'm looking for, it could have been yesterday or it could have been 4 years ago. If I select last 12 months I might miss something 13 months ago. There's a lot of ambiguity in what I'm searching for (otherwise it wouldn't be called a lookup) and when I'm looking for current information then weighting by age is a lot more natural.

Another issue is that I don't know what the filter is selecting either, a 6 year old article might be better if it's been updated, but I can't tell from the interface what property is being filtered.


I often happen to search for things in multiple intervals. Often the top results are obsolete, since they are more than a year old.

What drove me nut's, is that their QA didn't catch the bug with the date format order for years. It only recently got fixed. The calendar selection was regional and put in the date in the regions format (Often dd/mm/yyyy), while the query form expects mm/dd/yyyy.


I'm confused, did i do something wrong or why can i find the referred article just fine on google, just by searching the keywords?

https://imgur.com/a/szBcB


becuase the OP wrote an article that links to these things, and caused them to be re-indexed.

It's not purely a function of 'date published' it is also about frequency of access


Ah true that makes sense, thanks for clarifying my confusion


You can find it now because it's getting traffic again due to TFA. It's no longer "stale."


Yeah, doesn't Google have multiple crawlers running with different missions? Ie, deep content crawlers and "fresh" content crawlers?


Doesn't Google downplay non-HTTPS sites, which older ones probably will be. Also, maybe it's just part of the algorithm that people tend to search for recent pages more often than old ones.


Maybe this means there is room for someone to start a new search engine for hackers. That would be very exciting news, because that's what Google was originally: the search engine hackers used.


This reminds me a lot of another article posted here recently: http://www.sicpers.info/2017/12/computings-fundamental-princ....

This behavior makes sense for most people, those who don't know what they're doing, at the expense of rendering the tool useless in cases where advanced functionality is needed. Perhaps Google should have an advanced, AI lite version.


Slightly off topic; Not on the Results Accuracy for a minutes.

Ob­vi­ous­ly, in­dex­ing the whole Web is crush­ing­ly ex­pen­sive, and get­ting more so ev­ery day

Why? CPU, Memory, SSD has all gotten a lot faster and bigger. Bandwidth is also a lot cheaper for Google since they essentially owns the fibre. Faster Algorithm may have an even bigger impact.

I would have thought information, in terms of text ( Not Video and Pictures which is a few order of magnitude larger ) would now be cheaper then the old days.


Developers say this type of shit all the time and people wonder why we don't have fast software.


I think this is a way that google kind of _forcing_ you to participate in contributing to their AI development.

By _improving_ their system, it creates some difference, if no one cares or no one can justify the necessity, then google doesn’t need to care.

The constant increasing amount of information on the web nowadays is certainly a burden to Google. And if such AI can handle 99.99% of what’s matter to user, this AI is already a brilliant one.

After all I don’t think google ever wanted to be an archive searcher.


> My men­tal mod­el of the Web is as a per­ma­nen­t, long-lived store of humanity’s in­tel­lec­tu­al her­itage.

My mental model of the Web is a human brain with all kinds of weird mechanisms - one that forgets things, makes things up, mixes memories. Google is like an "association cortex" - take it away and all the memories are still there but access to them is much harder and needs to go through more indirections (links).


I imagine this is a symptom of their parallelism. Sort of like the app engine datastore being eventually consistent, they operate optimistically. Presumably this reduces their costs and increases responsiveness. As for gmail, I noticed, when trying to archive a bunch of stuff, of search, select all, not operating on the theoretical full search set many years (at least 3 I'd guess) ago.


We don't need to rely on Google. We can already build* our own knowledge graphs using open source software.

https://github.com/synchrony/smsn/wiki

We can even interconnect them -- selecting what's private, what's public, what's shared with some people but not others.


> I think Google has stopped in­dex­ing the old­er parts of the We­b. I think I can prove it. Google’s com­pe­ti­tion is do­ing bet­ter.

If this about finding older parts of the web, don't forget to use the Wayback Machine of archive.org:

https://archive.org/web/


n+1 I've 100 percent noticed this too, infuriating when it comes to looking for emails as they pertain to legal matters


I think the solution would be for some sort of personal proxy that captures all the searches and webpages we visit. Then when we go to make new searches, we can first search locally (to re-find things we're thinking of) and then externally (to find new things). Not sure if something like this exists yet


If there seems to be a problem with Google not indexing your site, Google's webmaster tools can help debug what's going on: https://www.google.com/webmasters/tools


That's if your own site is affected - but what if I want to find content from another site, and Google doesnt properly index that? I can't exactly use Google's webmaster tools for that, and the owner may not care, or can't be contacted.


Actually you can:

https://www.google.com/webmasters/tools/submit-url

But you DO need a Google Account though.


DuckDuckGo seems to have no problem returning the result, even with general terms: https://duckduckgo.com/?q=tim+bray+rock+roll+animal


Google also returns the article with this query


That depends, considering search results are usually tailored to suit individuals (if logged in).


True, but it could not return this article for anybody if it was not indexed.

Although it is fair to assume that Tim's blog post and all these new links pointing to the old article have triggered it's reindexation.

It feels a bit like quantum physics, now that the article is out, the state of the web has been observed and has changed.


DDG uses Bing as the backend.


I don't believe that's true, at least not in the way you seem to imply. It is true that Yahoo and Bing are literally the same search engine on the backend, but that is not true of DuckDuckGo and Bing.

To quote the first line of the DDG wikipedia article on how it works[0]:

> DuckDuckGo's results are a compilation of "over 400" sources, including Yahoo! Search BOSS; Wikipedia; Wolfram Alpha; Bing; its own Web crawler (the DuckDuckBot); and others.

[0] - https://en.wikipedia.org/wiki/DuckDuckGo#Overview


I wish you could use both Verbatim and Sort by date at the same time. Instead, you end up having to choose between recent but irrelevant results, or relevant but outdated results.


Wait so Google isn't surfacing you or your friend's shitty article on some band you guys like so that makes it broken? Sounds like Google is working better than it used to.


Honestly it feels like a good time to go back to MH handling my inbox but dropping a nice indexer like elasticsearch on the front.


I started using notmuch[1] a couple of years ago and cannot imagine living without it. It can do free text search on almost a million emails (and probably much more) in a fraction of a second. I subscribe to a lot of mailing lists and add various tags on each making for some very powerful search queries.

E.g "from:torvalds and to:linux-ext4" to bring up all emails ever with those properties. Add some free text and/or "tag:foo" to narrow it down.

[1] https://notmuchmail.org


I like the idea that the Internet forgets things. Seems healthy. Eternal memory for some things is disturbing.


https://www.google.com/search?q="we+were+watching+the+Democr...

I can find the article this person is referring just by looking at a string of it. So the article _is_ indexed. Not sure what he is referring to.



The second link is a partial duplicate of the first link; it contains nothing not contained in the first link. Maybe that's why?


Its possible that by linking to his article he triggered a re-indexing of it? I wont pretend to know how google works but seems plausible to me.


I can find the article just by searching 'tim bray rock roll', so for one reason or another, the article is now indexed.


I would be surprised if it has not been reindexed thanks to the link from this article.


That result is just the blog's page for November 2008, not the article page itself. So while that page _does_ have the article text on it, it is not the article itself.


I don't see any links to https://www.tbray.org/ongoing/When/200x/2006/03/13/Rock-n-Ro... in those search results.


And rightly so, the quote is from Brent Simmons' article.

However, you're right anyways: you can't reach Tim's article by searching quotes.


I've built my own search engine that caters 90% of my needs. It's not that difficult.


But, by posting this article you will probably have ruined your own evidence. :-D

"Nice try!" :D


Both sites could use a little SEO. Google being imperfect isn't a new thing.


This really sucks. Time for an open source search engine a la wikipedia.


I think Google should have "Search from Archive" feature


I often limit the search with the time / date filter.


Is Google doing something to fix that?


I recently switched off Google Search and it's totally fine. If it disappeared I wouldn't be too upset.


memory loss is good for those struggling with the right to be forgotten


I've switched completely to bing for a few reasons:

1) Bing is much faster to load

2) Bing doesn't muck up the links. I can right click->copy and get the actual link. With google you get a long incantation that doesn't tell you what site it goes to.

3) Bing handles a few things better, eg, "$500 in ₹" works in bing but not in google.

However, when I need local results, or results about something ongoing, Google is the undisputed king. Nobody else even comes close.


When trying to convert currencies, try using three-letter codes for the currency. To take something a bit more obscure than dollars and rupees as an example, "500 PLN to RSD" (Polish zloty to Serbian dinar) will definitely work in every search engine.

As far as I understand it (I never actually looked it up to confirm), currency codes = two letter country codes + first letter of the noun from the currency name, so they're relatively easy to guess right.


I know, INR works instead of ₹. But goddammit I don't wanna type those extra 2 letters!


Remember when searching used to be a taught skill? People would learn to use boolean operators to find exactly what they want. Now I'm not sure if any of that even works with Google. We went from a simple machine with many (mostly optional) inputs, to a complex but stupid one with one input that actively fights against your own intelligence.


The style used for the links borders on vulgarity.


... Why? Looks fine to me (other than that I prefer underlined links)


Dark red links do not contrast well amongst black text. I still can barely tell what is a link and what is just text. Maybe I am experiencing some vision loss or colour blindness.


Interesting perspective


I've been using Google since 1998. I recently switched off Google Search and don't miss it at all.

Google is in big trouble.


What did you move to?


It's going to be a little sad when Google Maps goes away. It will be much more sad when Google Flights goes away. But I can't think of anything else Google provides that I couldn't get from a bunch of competitors. Well... maybe Google Translate.


In many places I've found both OSM and Here maps to provide higher quality data than Google Maps, actually. So it won't be that bad when (not if, considering it's Google) they stop Google Maps. And DeepL is starting to beat Google Translate on some language pairs, too.

The hardest to replace part is search today.


Try pinging your site: https://pingomatic.com/


Nobody can index the whole web. Even a single site in the form of

    Homepage of Joe Infinity
    You are on page <?$pageNr?>
    <a href="<?$pageNr+1?>">Next Page</a>
can not be completely indexed. A search engine will crawl it to some depth based on many factors. Age might be one of them. There is no way to index 'everything' on the web.


That's really not relevant to this article. The author is not talking about crawling and indexing the entire web (although he mentions the "whole web" once, that's clearly not what he means). He is wondering why old pages -- pages that used to be in Google's index -- are no longer showing up in SERPs even when using appropriately-targeted long-tail queries.


For the same reason 'Joe Infinity page 1234567' would not be found anymore. Google thinks its not relevant enough to keep it indexed. Yes, it is debatable what is relevant enough and what isn't. But everyone who indexes 'the web' has to decide what to keep and what not. Nobody can store 'everything'.

Also it's not as easy as just keeping everything that ever was in the index in there. Then searchengines would link to noexisting urls most of the time. Most URLs have a short lifespan. Links rot pretty fast.


I completely agree with you, but your initial argument was that "Joe Infinity page ∞" wouldn't be indexed because Google cannot index every viable page on the internet. That is true, and Google will certainly set limits on what pages is crawls and what pages it indexes. However, in this instance the articles were crawled and they were indexed and they were relevant at one point in time. But google decided to remove them from SERPs for some reason or another (age, lack of traffic, etc).


On the contrary, it is very relevant.

I run a search engine. What I save and think matters can be expressed in a very definite dollar value.

Old pages in practical reality equals "whole web", since the index isn't getting trimmed, and exponential cost.


> I think Google has stopped in­dex­ing the old­er parts of the We­b. I think I can prove it. Google’s com­pe­ti­tion is do­ing bet­ter.

The first sentence is just common sense, and no particular proof is needed. The last sentence might or might not be true, but the anecdotes in this article say nothing about whether or not its true. The problem is that we don't know how Tim selected these two particular pages as examples.

If he randomly selected two 10 year old pages from the universe of all such pages, it'd at least be a valid methodology, just with far too small a sample size. But obviously he didn't do that. If the methodology instead was to search for pages on Google first, then on Bing iff there was no Google match, this tells us nothing at all. You need to run all queries on both engines, not just the ones that fail on one search engine.

Another reasonable method would be to look at aggregate referer trends; is traffic from Google to old pages decreasing faster than traffic from Bing to those pages.


> The first sentence is just common sense, and no particular proof is needed.

How is this common sense?

> The problem is that we don't know how Tim selected these two particular pages as examples.

Yes we do know. He was using Google to find his own old stuff over the years. Some content he was referring regularly disappeared from Google's results. These pages had previously been included in the results.

> Another reasonable method would be to look at aggregate referer trends; is traffic from Google to old pages decreasing faster than traffic from Bing to those pages.

Yes that would be interesting.

I've been wondering whether Google would actually purge the URL too. The Googlebot used to be very persistent in retrying "404 not found" results.


> How is this common sense?

Because it's in practice impossible to index every page. Index selection has always been a core quality feature in search engines. (Both re: which pages get included, and re: which pages get included in which layer of index in multi-layered index schemes).

> Yes we do know. He was using Google to find his own old stuff over the years. Some content he was referring regularly disappeared from Google's results. These pages had previously been included in the results.

That's just a guess, it's not actually stated anywhere in Tim's article. But yes, given he did not say otherwise, what you propose is probably what happened. He had a couple of pages which he knew were not found on Google, and checked whether they could be found on Bing.

But my whole point is that this kind of methodology is total garbage. And then he's making pretty absolute statements, like his tweet about the post "TIL that both Bing and DuckDuckGo apparently index a lot more of the Web than Google does".


People used to say that Google would always be the best search index because it had the biggest index, and nobody could match Google there. Being more selective about what you include seems like a big change from past practice, or at least past narrative.


Yes, but what jsnell is saying it, perhaps if you perform the same experiment with Bing, you'd find pages it didn't index, but that were present in other search engines.

You can't say A is better than B with a few data points. You can say you think B's behavior has changed compared to the past. But that's also erroneous.

It's possible the behavior was always there, you just never tripped over it and rare enough that most people don't, either because the web was too smaller before, or your own content was smaller, or it's recent link and access patterns changed enough to trigger the behavior.




Consider applying for YC's Summer 2025 batch! Applications are open till May 13

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: