I'm not sure that is "broken" but more like inappropriate. I read an article with a list of bad default results (as in with no filter bubble) that are way worse. Like searching for "tween" on bing shows auto suggestions like "tween swimsuits inappropriate" and "tween budding images". Same for all search engines that uses bing like Brave and DDG (at least at the time I read about it). It seems pretty obvious what those searches are looking for.
Replying to myself here but I reported the issue to Bing. If they don't fix this then not only are they serving inappropriate search suggestions but they also run the risk of a huge shitstorm. I'm amazed it haven't been fixed ..
I don't know what you mean with Sears? Do you mean a search result from Bing? Because I'm not talking about results but about Microsoft taking what I'm guessing is often used searches from people looking for underage girls in inappropriate clothing or pictures of underage girl's breasts (budding, pokies, etc.) and using it as default auto-suggestions for all bing.com users (the suggestions that show up in the search bar when you type).
Are you saying you think Microsoft suggesting to people they might want to search for images of 9 to 12 year old girls (tweens) in inappropriate clothing or pictures of their breasts are a good choice of suggestions?
The article I read about it in were from the US, 5000+ km from where I'm at, and the results were the exact same as I get (and I don't use Bing). It has nothing to do with me or where I'm at but nice try implying otherwise.
i just did a regular text search for it and the top results where "how many raccoons can fit up your butt"... no voice assistant involved, and the destination urls were not google's.
Yes, and it is; however, my (intended) point was that it is what almost no one is looking for when they type "Cicero", and if you simply search for Cicero by itself, you get nothing telling you that it's a typographic unit.
@rijnard (the blog post author) is awesome, and all of the code changes he talks about in the blog post are public. You can see all of his recent changes to search code in https://sourcegraph.com/search?q=context:global+repo:%5Egith... in case you want to follow along by just reading the code (that query shows all of his diffs that touch paths containing `search`).
I'm sure there's tonnes of research and prior art on this subject, but it's an interesting inquiry.
off the top of my head, there's two meaning - performance benchmarking (how fast the search results comes back), and accuracy/fit-for-purposeness benchmarking (how good it is at finding something the user intends).
Performance is easy. It's the accuracy/fit-for-purposeness that would be an interesting benchmark.
I wonder if you have to use an empirical measurement for accuracy - that is, give a random sample of people a target piece of code (or file) to find, and see how long or how many queries it takes to find it.
For quality, you really do need to do human qualititative measures to get a full measure, with all of the fun that involves.
However, you can do things like generate search terms from your top N documents through some method, and then do the queries and confirm the document you generated the term from shows up in the top M results.
This can be circular though if you're not careful; the top N documents may not include important documents that nobody could find.