Hacker News new | past | comments | ask | show | jobs | submit login

Google has definitely stopped being able to find the things I need.

Pasting stack traces and error messages. Needle in a haystack phrases from an article or book. None of it works anymore.

Does this mean they are ripe for disruption or has search gotten harder?




> Pasting stack traces and error messages.

I cannot fathom the number of times I've pasted an error message enclosed by quotes and got garbage results, and then an hour of troubleshooting and searching later I come across a Github/bugtracker issue, which was nowhere in the search results, were the exact error message appears verbatim.

The garbage results are generally completely unrelated stuff (a lot of Windows forum posts) or pages were a few of the words, or similar words, appear. Despite the search query being a fixed string not only does Google fail to find a verbatim instance of it, but instead of admitting this, they return nonsense results.

> Needle in a haystack phrases from an article or book.

I can confirm this part as well, searching for a very specific phrase will generally find anything but the article in question, despite it being in the search index.

Zero Recall Search.


It seems like pasting a very large search query should actually make it easier for the search engine to find relevant results, but given that this doesn't happen suggests that the search query handler is being too clever and getting in the way.


> a Github/bugtracker issue, which was nowhere in the search results, were the exact error message appears verbatim.

Did you put the error message in quotes? I've never had this problem.


I, too, have pasted error messages verbatim into Google queries only to have garbage returned. I did include the error message in quotes. I started filtering sites from the results eg `-site:quora.com site:stackoverflow.com site:github.com` etc to start to get a hint of other developers with similar issues and/or some bug reports and/or documentation and/or source code.


> -site:quora.com site:stackoverflow.com site:github.com`

A mock-google that excludes quora and optionally targets stackoverflow/github sounds useful.


Could it be that there just isn't a single page in the web with the exact error?


My guess is that suppressing spammy pages got too hard. So they applied some kind of big hammer that has a high false positive rate. You're getting the best of what's left.

Maybe also some quality decline in their gradual shift to less hand weighted attributes and more ML.


My guess is that Google et al are all hell-bent on not telling you that your search returned zero results. They seem to go to great lengths to make sure that your results page has something on it by any means necessary, including: searching for synonyms for words I searched for instead of the specific words I chose, excluding words to increase the number of results (even though the words they exclude are usually the most important to the query), trying to figure out what it thinks I asked for instead of what I actually asked for.

I further suppose a lot of that is that The Masses(tm) don't use Google like I do. I put in key words for something I'm looking for. I suspect that The Masses(tm) type in vague questions full of typos that search engines have to try to parse into a meaningful search query. If you try to change your search engine to caters to The Masses(tm), then you're necessarily going to annoy the people that knew what they were doing, since the things that they knew how to do don't work like they used to (see also: Google removing the + and - operators).


I was going to reply with something along the same lines. Dropping the keyest keywords is a particularly big pet peeve of mine.

For those "needle in a haystack" type queries, instead of pages that include both $keyword1 and $keyword2, I often get a mix of the top results for each keyword. The problem is compounded by news sites that include links to other recent stories in their sidebars. So I might find articles about $keyword1 that just happen to have completely unrelated but recent articles about $keyword2 in the sidebar.

It also appears that Google and DDG both often ignore "advanced" options like putting exact phrase searches in quotation marks, using a - sign to exclude keywords, etc.

None of this seems to have cut down on SEO spam results either, especially once you get past the first page or two of results.

I suspect it all comes down to trying to handle the most common types of queries. Indeed, if I'm searching for something uncomplicated, like the name of the CEO of a company or something like that, the results come out just fine. Longtail searches probably aren't much of a priority, especially when there's not much competition.


Surely most engineers want the power of strict searching and less of the comforts of being always getting filler results, right?

So... is there an internal service at Google that works correctly but they're hiding from the world?

It might be useful for Google to make different search engines for different types of people. The behaviors of people are probably multi-modal, rather than normally distributed along some continuum where you should just assume the most common behavior and preferences. \

It would even be easier to target ads...

Or maybe this doesn't exist and spam is too hard.


> They seem to go to great lengths to make sure that your results page has something on it by any means necessary

You just described how YouTube's search has been working lately. When you type in a somewhat obscure keyword - or any keyword, really - the search results include not only the videos that match, but videos related to your search. And searches related to your keywords. Sometimes it even shows you a part of the "for you" section that belongs to the home page! The search results are so cluttered now.


Searching gibberish to try to get as few results as possible.

I got down to one with "qwerqnalkwea"

"AEWRLKJAFsdalkjas" returns nothing, but youtube helpfully replaces that search with the likewise nonsensical "AEWR LKJAsdf lkj as" which is just full of content.


> I put in key words for something I'm looking for. I suspect that The Masses(tm) type in vague questions full of typos that search engines have to try to parse into a meaningful search query.

Yeeaap, sometime in gradeschool - I think somewhere around 5th grade, age 11 or so, which would be around 1999 - we had a section on computers, where we'd learn the basics about how to use them. One of the topics I remember was "how to do web searches", where a friend was surprised at how easily I found what I was looking for - the other kids had to be trained to use keywords instead of asking it questions.


It's surprisingly easy to get zero results returned pasting cryptic error messages. It doesn't mean there is nothing, though. Omit half the string, and there's the dozen stack overflow threads with the error. Maybe it didn't read over the line break on stack overflow or something, but I haven't tested anything.


Tyranny of the minimum viable user.


Two anecdotes: It’s really fascinating.

1. My work got some attention at CES so I tried to find articles about it. Filtering for items that were from the last X days and searching for a product name found pages and pages of plagiarized content from our help center. Loading any one of the pages showed an OS appropriate fake “your system is compromised! Install this update” box.

What’s the game here? Is someone trying to suppress our legit pages, or piggybacking on the content, or is that just what happens now?

2. I was looking for some OpenCV stuff and found a blog walking through a tutorial - except my spidey sense kept going off because the write up simply didn’t make sense with the code. Looking a bit further I found that some guys really well written blog had been completely plagiarized and posted on some “code academy tutorial” sort of site - with no attribution. What have we come to?


The first seems big right now, on weird subdomains of clearly hacked sites. E.g. some embedded Linux tutorial on a subdomain of a small-town football club.


Yup. Entertainingly I just saw an example of the “lying date” the original article pointed out: according to google the page is from 17 hours ago. However right next to this it says June, xx 2018. Really?


Well that “big hammer” so to speak is that they tend to favor sites that have a lot of trust and authority.

Someone mentioned that the sites that have the answer typically is buried in the results. That’s because they tend to favor big brands and authoritative sites. And those sites oftentimes don’t have the answer to the search query.

Google’s results have gotten worse and worse over the years.


This! I think this is the biggest piece of the puzzling issue.

Was it Panda update or that one plus the one after - it took out so much of the web and replaced it with "better netizens" who weren't doing this bad thing or that bad thing.

Several problems with that - 1 - they took out a lot of good sites. Many good sites did things to get ranked and did things to be better once they got traffic.

The overbroad ban hammer took many down - and many people that likely paid an seo firm not knowing that seo firms were bad in google's eyes (at the time) - so lots of mom and pops and larger businesses got smacked down and put out of the internet business - just like how many blogs have shut down.

Of course local results taking a lot of search space and the instant answers (50% of searches never get a click cuz google gives them the answer right on the results page (often stolen from a site) are compounding this.

They tried having the disavow tool to make amends - but the average small business doesn't know about these things, and getting help on the webmaster forum is a joke if you are tech inclined, imagine what an experience it is for small business owners.

I miss the days of Matt Cutts warning people "get your Press Releases taken down or nofollowed or it's gonna crush you soon" - problem is most of the people who were profiting from no-longer-allowed seo techniques were not reading Matt's words.

I also appreciated his saying 'tell your users to bookmark you, they may not find you in google results soon' - yeah, at least we were warned about it.

The web has not been the same since those updates, and it's gotten worse since. This does help adwords sell and the big companies that can afford them though.

In these ways google has been kind of like the walmart of the internet, coming in, taking out small businesses, taking what works from one place and making it cheap at their place.

I'd much rather have the results of pre-penguin and let the surfers decide by choosing to remain on a site that may be good that also had press releases and blog links... rather than loosing all the sites that had links on blogs. I am betting most of the users out there would prefer the results of days past as well.


I've been using DDG as a good enough search engine for most things, but when I sometimes fall back to Google, it blows me away how many ads are on the page pretending to be results!


'if not ddg(search) { ddg("!g " + search) }' has been my go-to method for awhile now; but as time has progressed, the results from DuckDuckGo have either been getting better, or the Google results have been getting worse; because usually if I can't find it on DDG now, I can't find it on Google either.


I use DDG by default, but I can feel myself mentally flinching unless I basically know what I'm looking for already (i.e. I know I'll end up on StackOverflow). When I'm actually _searching_, it's useless, and I'll always !g.


Same here, I actually prefer DDG to Google now, even for regional (Germany) results.

When I switched, about a year and a half ago, I felt like I was switching to a lesser quality search engine (it was an ethical choice and done because I can), that, however, gradually and constantly got better, whereas Google went the opposite path.

Nowdays I only really use Google to leech bandwidth off their maps services. Despite there being a very good alternative available, OpenStreetMaps, they unfortunately appear to have limited (or at least, way less than Google) bandwidth at their disposal... A pity though, because their maps are so awesome, the bicycle map layer with elevation lines is any boy scout's wet dream... but yeah, to find the next hairdresser, Google'll do.

Speaking of bandwidth and OSM reminds me, is there an "SETI-but-for-bandwidth-not-CPU-cycles" kind of thing one could help out with? Like a torrent for map data?

EDIT: Maybe their bandwidth problems are also more the result of a different philosophy about these things. OSM is likely "Download your own offline copy, save everybody's bandwidth and resources" (highly recommended for smartphones, especially in bandwidth-poor Germany) whereas Google is "I don't care about bandwith, your data is worth it".


> Speaking of bandwidth and OSM reminds me, is there an "SETI-but-for-bandwidth-not-CPU-cycles" kind of thing one could help out with? Like a torrent for map data?

OSM used to have tiles@home, a distributed map rendering stack, but that shut down in 2012. There is currently no OSM torrent distribution system, but I'd like to set that up.


Google images isn't even worth using at all anymore, after that Getty lawsuit that made them remove links to images (the entire damn point of image search as far as I'm concerned..)


I think the Web just kind of stopped being full of searchable information.


Imagine if instead of kneecapping XHTML and the semantic web properties it had baked in, Google had not entered into the web browser space. We might be able to mark articles up with `<article>`, and set their subject tags to the URN of the people, places, and things involved. We could give things a published and revised date with change logs. Mark up questions, solutions, code and language metadata. All of that is extremely computer friendly for ingestion and remixing. It not only turned search into a problem we could all solve, but gave us rails to start linking disparate content into a graph of meaningful relationships.

But instead Google wanted to make things less strict, less semantic, harder to search, and easier to author whatever the hell you wanted. I'm sure it has nothing to do with making it difficult for other entrants to find their way into search space or take away ad-viewing eyeballs. It was all about making HTML easy and forgiving.

It's a good thing they like other machine-friendly semantic formats like RSS and Atom...

"Human friendly authorship" was on the other end of the axis from "easy for machines to consume". I can't believe we trusted the search monopoly to choose the winner of that race.


I work for Google but not on search.

I think in this case semantic web would not work, unless there was some way to weed out spam. There are currently multiple competing microdata formats out there than enable you to specify any kind of metadata but they still won't help if spammers fill those too.

Maybe some sort of webring of trust where trusted people can endorse other sites and the chain breaks if somebody is found endorsing crap? (as in, you lose trust and everybody under you too)


> I think in this case semantic web would not work, unless there was some way to weed out spam.

That's not so hard. It's one of the first problems Google solved.

PageRank, web of trust, pubkey signing articles... I'd much rather tackle this problem in isolation than the search problem we have now.

The trust graph is different from the core problem of extracting meaning from documents. Semantic tags make it easy to derive this from structure, which is a hard problem we're currently trying to use ML and NLP to solve.


>Semantic tags make it easy to derive this from structure

HTML has a lot of structure already (for example all levels of heading are easy to pick out, lists are easy to pick out), and Google does encourage use of semantic tags (for example for review scores, or author details, or hotel details). For most searches I don't think the problem lies with being able to read meaning - the problem is you can't trust the page author to tell you what the page is about, or link to the right pages, because spammers lie. Semantic tags don't help with that at all and it's a hard problem to differentiate spam and good content for a given reader - the reader might not even know the difference.


> PageRank, web of trust, pubkey signing articles...

What prevents spammers from signing articles? How do you implement this without driving authors to throw their hands in the air and give up?


In the interests of not causing a crisis when Top Level Trust Domain endorses the wrong site and the algorithm goes, "Uh uh," (or the endorsement is falsely labeled spam by malicious actors, or whatever), maybe the effect decreases the closer you are to that top level.

But that's hierarchical in a very un-web-y way... Hm.


The internet is still kind of a hierarchy though, "changing" "ownership" from the government DARPA to the non-profit ICANN.

And that has worked... quite fine. I have no objections (maybe they're a bit too liberal with the new TLDs).

Most of the stuff that makes the hierarchies seem bad are actually faults of for-profit organizations (or other unsuited people/entities) being at the top, and not just that someone is at the top per se. In fact, in my experience, and contrary to popular expectation, when a hierarchy works well, an outsider shouldn't actually be able to immediately recognize it as such.


> Imagine if instead of kneecapping XHTML and the semantic web properties it had baked in, Google had not entered into the web browser space. We might be able to mark articles up with `<article>`, and set their subject tags to the URN of the people, places, and things involved. We could give things a published and revised date with change logs. Mark up questions, solutions, code and language metadata.

Can you explain in technical details what you think was lost by Google launching a browser or what properties were unique to XHTML?

Everything you listed above is possible with HTML5 (see e.g. schema.org) and has been for many years so I think it would be better to look at the failure to have market incentives which support that outcome.


Good machine-readable ("semantic") information will only be provided if incentives aren't misaligned against it, as they are on much of the commercial (as opposed to academic, hobbyist, etc.) Web. Given misaligned incentives, these features will be subverted and abused, as we saw back in the 1990s with <meta description="etc."> tags and the like.


I don't think there's any reason to think google was responsible for the semantic web not taking off. People just didn't care that much. It may have been a generally useful idea, but it didn't solve anyone's problem directly enough to matter.


It wouldn’t matter. 0.0001% of content authors would employ semantic markup. Everyone else would continue to serve up puréed tag soup.


If WordPress outputs semantic output that instantly gives you a lot more than 0.0001%. The rest would follow as soon as it improves discoverability of their content


Wordpress can't magically infer semantic meaning from user input any better than Google can. The whole point of the semantic web is to have humans specifically mark their intention. A better UI for semantic tagging would help for that, but it would still be reliant on the user clicking the right buttons rather than just using whichever thing results in the correct visual appearance.


> 0.0001% of content authors would employ semantic markup.

You don't think we'd have rich tooling to support it and make it easy to author?

Once people are using it with success, others will follow.


The breakthrough would be when Google were to rank pages with proper semantic markup higher. Just look at AMP.

(Of course that won't ever happen, but that's what would be needed.)


Did you try putting them in quotes?

EDIT: I don't know why this being downvoted. This is a genuine question to understand if the problem is the size of the index or the fuzzing matching that search engines do.


Quotes doesn't work reliably anymore, this is a big part of the problem. Googlers have been really busy the last 10 years doing everything except:

- fixing search (it has become more and more broken since 2009, possibly before. Today it wotks more or less like their competitors worked before, random mix of results containing some of my keywords.)

- fixing ads (Instagram should have way less data on me and yet manages to present me with ads that I sometimes click instead of ads that are so insulting I go right ahead and enable the ad blocker I had forgotten.)

- saving Reader

- etc


> I don't know why this being downvoted.

tbh, it's one of those "Are you sure you're not an idiot?" replies.


Google blatantly disregard quotes.


I think the behavior is more complex. I do get disregarded quotes from time to time so I typically leave them off. However, for the query 'keyword1 keyword2', if I get a lot of keyword1 results with keyword2 struck through, and I search again with keyword2 in quotes, it works as expected.


Reference?


Will you take my word for it?

They not only disregard quotes but also their own verbatim setting.


Asking for a reference helps:

- Establish the behaviour as documented.

- In representing and demonstrating this to others.

It's not that I doubt your word, but that I'd like to see a developed and credible case made. Because frankly that behaviour drives me to utter frustration and distraction. It's also a large part of the reason I no longer use, nor trust, Google Web Search as my principle online search tool.


I see. I'll try to make a habit out of collecting those again.

That said, I might have something on an old blog somewhere. I'll see if I can find it before work starts...

Edit: found it here http://techinorg.blogspot.com/2013/03/what-is-going-on-with-... . It is from 2013 and had probably been going on for a while already at that point.

Edit2: For those who are still relying on Google, here's a nice hack I discovered that I haven't seen mentioned by anyone else:

Sometimes you might feel that your search experience is even worse than usual. In those cases, try reporting anything in tje search results and then retry tje same search 30 minutes later.

Chances are it will now magically work.

It took quite a while for me to realize this and I think in the beginning I might not have realized how fast it worked.

It seemed totally unrealistic however that a fkx would have been created and a new version deployed in such a short time so my best explanation is they are A/B-testing some really dumb changes and then pulling out whoever complains from the test group.

Thinking about it this might also be a crazy explanation for why search is so bad today compared to ten years ago:

There's no feedback whatsoever so most sane users probably give up talking to the wall after one or two attempts. This leaves them with the impression that everyone is happy, so they just continue on the path back to becoming the search engines they replaced.


Thanks.

I'm getting both mixed experiences and references myself looking into this. Which is if anything more frustrating than knowing unambiguously that quoting doesn't work.

I've run across numerous A/B tests across various Google properties. Or to use the technical term: "user gaslighting".


If they were ripe for disruption and it was easy to do this disrupting just be returning better search results, and returning better search results was an easily doable thing then I suppose all the other functioning businesses that have a stake in web search would already be doing that disrupting.

Search disrupted catalogs. What will disrupt search?


Boutique hand crafted artisanal catalogs?

Not joking I have a feeling subject specific topics will be further distributed based on expertise & trust.


That's exactly what github's Awesome lists are: Decentralized, democratized handcrafted subject-specific catalogs


If they became important sources of information outside technically competent people I suppose we would end up with a bunch of Awesome lists of Content Farms!


Return of the Yahoo! Directory and DMOZ? Heh.


> Boutique hand crafted artisanal catalogs?

I think those are called books. ;-)


Pubmed is an excellent example of a boutique search engine.


So the old Yahoo web index, basically.


Are you sure there's a page on the web that has the stack trace you search for? Maybe there just isn't anything.


Perhaps expectations have risen over time




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: