I have some questions about information retrieval and SLOs: \* Is there a metric...

BjoernKW · on Jan 16, 2018

This won’t answer all of questions but the measures you’re looking for are called ‘recall’ and ‘precision‘:

- recall: number of relevant documents retrieved / number of relevant documents

- precision: number of relevant documents in result set / number of documents in result set

mherdeg · on Jan 16, 2018

Yeah you know, it's funny, the last time I worked on question-answering code, we were trying really hard to find algorithms that could improve a particular metric (F-score, a synthetic agglomeration of precision and recall) ... I don't remember hearing very many conversations at all about whether we were measuring the right thing.

Given a query like [site:tbray.org "rock n roll animal"], and knowing that the 1 relevant document we actually want is the review at https://www.tbray.org/ongoing/When/200x/2006/03/13/Rock-n-Ro... , I think we can say that

* if Google search returns 4 results for the query, not including the review: precision is 0/4, recall is 0/1 (so p=0, r=0)

* if Google search returns 5 results for that query, including the review: precision is 1/5, recall is 1/1 (so p=0.2, r=1)

But while I _kind of_ understand how we can use these measures to assess the outcome of a single query, I'm really not sure I understand what meaningful ways are available to aggregate those metrics. Suppose we're going to get 1M queries in the next hour. Do we prefer an algorithm which has the highest mean F-score per query? highest median F-score per question? or which has the highest 1st percentile F-score per question (99% of queries get the best possible outcomes?)

If there is published literature on how search quality is measured I'd love to see it. Would be especially interesting to see real-time data -- e.g. what is the impact of 1 data shard outage on overall user-experienced quality according to some metric?

BjoernKW · on Jan 16, 2018

"Modern Information Retrieval" by Baeza-Yates / Ribeireo Neto a few years ago used to be a good standard work.

I'm not sure though how well it's kept up in terms of aspects like real-time search and graph search, both of which are fairly recent developments.