Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

I am deliberately keeping away from LLMs for search. I'm old enough to remember finally ditching Altavista for the new upstart Google. I did briefly flirt with Ask Jeeves but it was not good enough.

I don't think anyone has it sorted yet. LLM search will always be flawed due to being a next token guesser - it cannot be trusted for "facts". A LLM fact is not even a considered opinion, it is simply next token guessing. LLMs certainly cannot be trusted for "current affairs" - they will always be out of date, by definition (needs training)

Modern search - Goog or Bing or whatever - seem to be somewhat confused, ad riddled and stuffed with rubbish results at the top.

I've populated a uBlacklist with some popular lists and the results of my own encounters. DDG and co are mostly useful now, for me.



I miss Altavista every day. Case-sensitive search is how you tell DOS from DoS. Putting "exact phrases" in quotes no longer seems to work. Then they added insult to injury by forcing you to +mandate a term otherwise they might just ignore it. Now that no longer works either.

I've entirely given up on Google.

I've made extensive shortcuts so I can directly search various sites straight from my location bar: wikipedia, wiktionary, urbandictionary, genius, imdb, onelook, knowyourmeme, and about two dozen suppliers/distributors/retailers where I regularly shop.

If I need something that's not on that list, I'll try some search engines but I start with the assumption that I'm not going to find it, because the battle for search is lost.


> I've entirely given up on Google.

I have used Google very little for about 3 years now. Sometimes when DDG fails to find what I'm looking for I'll try Google. It rarely works better.


> Putting "exact phrases" in quotes no longer seems to work. Then they added insult to injury by forcing you to +mandate a term otherwise they might just ignore it. Now that no longer works either.

I don't understand why they got rid of these escape hatches. Sometimes I want the "top" pages containing precisely the text I enter -- no stemming, synonyms, etc. Maybe it shouldn't be the default, but why make it impossible?

In my ideal search world, there would also be an option to eliminate any page with a display ad or affiliate link. Sometimes I only want the pages that aren't trying to make money off of me.


I have a solution: search engine which uses machine learning to score the "commercialness" of a page. By commercialness, I mean: is it a table of products with prices; does it have buy buttons; does it use a lot of tracking and analytics; does it have a cart; is there a lot of product talk (and is it overbiased positively); how are all the pages within a couple link-degrees scoring; ... (and more). Then, give users a slider which right side means no filtering, left side means basically only return universities, Wikipedia, and PBS tier results.

This has to track number of ads and trackers in a page and not just be about product pages. This measure should also fight SEO spam, as the tracking and advertising elements would cause SEO spammers to lose rank on the engine (disincentivising an arms race).

Add in the patently obvious need for the poweruser's 2nd search bar, which takes set notation statements and at least one of a few popular powerful regex languages, and finally add cookie stored, user-suppliable domain blacklists and whitelists (which can be downloaded as a .txt and reuploaded later on a new browser profile if needed). I never ever want to see Experts Exchange for any reason in my results, as an immediately grasped example. Give the users more control, quit automagicking everything behind a conversationally universal idiot-bar!


I use uBlacklist to get rid of Expert Sexchange and similar low-quality sites in search results, and it seems to work well enough.

An "advanced mode" supporting literal keywords (with and without stemming) and boolean operators wouldn't cost the search companies anything. I think supporting regexp search would be hard: do you search your index for fixed substrings and expand around them? I'm not a search person...

I don't think you'd need much in the way of machine learning to filter out the spam. There are relatively few third-party display ad servers and affiliate networks, and those are the main lazy ways to make money. There's no need to filter out all commercial content; just getting rid of the "passive income" bros would be enough.


It's really strange, while I agree Google's results aren't as good as they used to be, they're still miles ahead of DDG for me. Is it because I still use keyword search like it's the early 2000s?

I tried to switch to DDG because Google was blocking Hurricane Electric IPv6 tunnels. DDG is still my homepage but I usually end up clicking the bookmark I made for ipv4.google.com. I wish I knew why DDG works for all you people but it's horrible for me.


Does you Google actually respect the keywords? For me, most of the times it replaces words with "synonyms" (mostly wrong context or not really replaceable). And results are pretty crap as a result - no what I was looking for, but just much more common/generic stuff.


If I put them in quotes, yeah.


Isn’t DDG basically Bing with a privacy layer?


That's what I've been told. I haven't tried Bing directly because... eww... but I assume the results would be similar to DDG.


Altavista was the OG. I remember it being cantankerous and requiring you to specific in how you searched, but if you knew how to use it, it was unmatched. Until Google.


When Google came out it was way better than Altavista, people switched instantly. Specifically Altavista looked at how often a search term was in the result, which wasn't always a helpful thing. Google also noticed if search terms were near each other in a page which was really helpful, otherwise you would get forums with one search term in one message, and the other far away in an unrelated message. Google fixed that.

The web has changed these days, it's an adversarial system now, where web results are aggressively bad and constantly trying to trick you. Google is much harder to implement now.


When Google came out, I started using it for some things, because yes it was better at some things, but I didn't stop using Altavista. Stayed with it until the very end, for cases where I could be certain that I knew the exact words that'd be in the page, and Google just sucked at that.

These days I can't even -exclude terms that I know would only appear in the wrong results, Google will show me those results anyway. Nothing about adversarial SEO requires them to ignore my input, that's a different choice.


It was fast, which almost nothing else was at the time.

And if people on dialup connections think you’re slow, it’s because you are.


Correct. Google became unusable around 2020. I search Wikipedia directly and rely on Duck for other needs. As rudimentary as it is for uncommon languages such as Ukrainian, DDG it's still better than Google. Shame on them.


If you ask ChatGPT 4o about a current event it will google things (do some sort of web search) and summarise the result.


and often I have to tell it to don't search, because it will just pull SEO polluted answers from Google and launder then slightly.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: