Hacker News new | past | comments | ask | show | jobs | submit login

It’s the mean. At least in Lucene. Using median would be an interesting experiment.

Do you know of a search dataset with very large document length differences? MSMarco for example is pretty consistent in length.




Was just thinking about some of the docs we have at work, and how most are relatively short ( probably < 10 pages) and some are like... 200+ page government things




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: