Show HN: Generative Benchmarking for RAG

jeffchuber · 2025-04-07T21:14:59 1744060499

I’m Jeff, co-founder of Chroma. We build the most popular open-source AI vector database. When people use Chroma, the first question they ask is which embedding model to use. This choice affects how your RAG application will perform in production.

We noticed that most people make their decisions based on popular benchmarks scores. However, widely used benchmarks like MTEB are often overly clean, generic, and in many cases, have been memorized by the embedding models during training. To address this, we introduce representative generative benchmarking—custom evaluation sets built from your own data and reflective of the queries users actually make in production.

We just published our in-depth technical report on this, and you can run a custom benchmark locally with the Chroma CLI.