After 20 years of being taken for granted that search engines as we know them are equipped to solve the typical problems we throw at them, I wonder if the whole concept of an unsupervised web crawl as the input to a single purpose search engine will just die out.
When I think about my typical web queries across the past year or two, it seems more and more likely that I'd be better off replacing Google with several purpose-built systems, none of which search the "entire web" (whatever that even means anymore). Technical queries? Just search StackOverflow and Github directly. Searching for a local venue of any kind? Search against a dedicated places database where new entries have to pass at least a cursory scrutiny. (Arguably Google Maps or Yelp already serve this purpose today, but I'm not sure if they have enough vetting today). Medical question? Search across a few sites known to be trustworthy.
We have become accustomed to go to Google because it's more convenient to type in a movie title, "chinese restaurant philadelphia", "flights to miami 4/12/24" or "Error code 127 python" into the same single place, but something tells me we'd be better off if that one place made some LLM-assisted guesses of what kind of search it is, and then went to a specialized search that is curated. If we go back toward the DMOZ/Yahoo model of directories that humans curate, I wonder if we could even reverse the trend toward spam and clickbait that has been so lamented in recent years.
For me search would be greatly improved if I could selectively exclude entire domains when I come across them. I want to be able to, with one click, remove GeeksForGeeks from all my search results — forever. And then I want to be able to continue to add to that (once called) "black list".
Never, ever show me Pinterest when I do an image search.
I imagine my search results would improve quickly in short order.
Better still, aggregate those lists from all users and you can improve search for users that have not yet built up a black list.
On the surface this is a good idea, however this would turn wildly anticompetitive. Whether or not your site would have business on the web would be entirely dictated by whether or not you woul be correctly classified (or indeed classified at all) in this engine. if you wanted to start your own stackoverflow competitor for whatever reason, you would have a very hard time getting any traction. this is also true of current general purpose engines, but you still do stand a chance to be referenced well and hit high enough to still get traffic.
the yahoo model collapsed for this very reason. back when you went to more than 5 websites to look at screenshots of the other 4, the directories would not necessarly show you the latest thing, because it wasn't on the list of sites manually added to each directory.
i think the current problem with google isn't to do with spam. i think google has become complacent because their ads are on all the sites anyway, so the function of "maximize revenue per search" doesn't actually care if you find what you're looking for, because you will get shown google ads anyway, and will be coming back to google anyway. in fact, they probably get to show more ads by feeding you bad results, because then you're loading more pages. this didn't used to be the case when google search was on top of spam sites, but it doesn't feel like they're doing anymore algo updates to curb the current trend, and spam sites have caught on to what ranks higher in the results.
> if you wanted to start your own stackoverflow competitor for whatever reason, you would have a very hard time getting any traction. this is also true of current general purpose engines, but you still do stand a chance to be referenced well and hit high enough to still get traffic
Hmm... You started to backpedal but then persisted. In the today world, your SO competitor would have that (slim) chance to rank if you started getting links from sites like HN or from people on Twitter who matter and know about tech. This would give you some PageRank and then you'd start possibly ranking in Google (in theory. In reality, no you probably wouldn't rank for anything since you're competing with 1,000,000 spam sites including whole verbatim clones of every page on SO that Google can't even get under control)
If any directory would be worth using, it would be run by humans who would HAVE to look at each submission. They could also look at who's linking to it, and evaluate "Is this a backlink from like, a gibberish page on `prawns-01-blork.info` or from like, Joel Spolsky's Twitter account?" Yes, it would take a lot of work, but like, it would be creating a truly useful product that people might pay for. And we have examples of other professions where "just rubber stamp everyone who pays" is frowned upon, like building inspectors and journalists. It's a hard problem, but it's far from hopeless.
By limiting web search results to "a few known sites", you'd be expediting the death of parts of it.
The beauty of search engines (in theory) is that you can find something NEW. Keeping the "open web" out would just entrench and ossify the current players.
A directory wouldn't be there to exclude anyone actually producing content of worth. It would serve as a gatekeeper to keep out plagiarism, spam, and utter trash. And people could create networks of sites based on their own real-world webs of trust, which vouch for one another.
Personally I'd rather see a standard which allowed you to add as many directories as you wanted to what your search engine would metasearch across. This also avoids the political problem of "who decides what's trash" -- if you want to add a directory whose main deal is they'll add literally any site, you could. If you want to only add directories which don't allow any <insert hated party> leaning content, you could do that.
When I think about my typical web queries across the past year or two, it seems more and more likely that I'd be better off replacing Google with several purpose-built systems, none of which search the "entire web" (whatever that even means anymore). Technical queries? Just search StackOverflow and Github directly. Searching for a local venue of any kind? Search against a dedicated places database where new entries have to pass at least a cursory scrutiny. (Arguably Google Maps or Yelp already serve this purpose today, but I'm not sure if they have enough vetting today). Medical question? Search across a few sites known to be trustworthy.
We have become accustomed to go to Google because it's more convenient to type in a movie title, "chinese restaurant philadelphia", "flights to miami 4/12/24" or "Error code 127 python" into the same single place, but something tells me we'd be better off if that one place made some LLM-assisted guesses of what kind of search it is, and then went to a specialized search that is curated. If we go back toward the DMOZ/Yahoo model of directories that humans curate, I wonder if we could even reverse the trend toward spam and clickbait that has been so lamented in recent years.