Build a search engine, not a vector DB

jankovicsandras · on Dec 20, 2023

I agree too. My impression is that almost all RAG tutorials _only_ talk about vector DBs, when these are not strictly required for Retrieval Augmented Generation. I'm guessing vector DBs are useful when you have massive amounts of documents on diverse topics.

Some gotchas I experienced (but I might be using the wrong embedding/vector DB: spaCy/FAISS):

- Short user questions might result a low signal query vector, e. g. user : "Who is Keanu Reeves?" -> false positives on Wikipedia articles which only contain "Who is"

- Typos and formatting affects the vectorization, a small difference might lead to a miss, e.g. "Who is Keanu Reeves?" -> match, "Who is keanu Reeves?" -> no match, no match with any other capitalization.

If there's only a single document, a simple keyword search might lead to better results.

In my experience, false positives (retrieving an irrelevant text and generating completely wrong answer) are a bigger problem than negatives (not retrieving text, possibly can't answer question).

Has somebody experience with Apache Lucene / Solr or Elasticsearch?

m-i-l · on Dec 20, 2023

> "Has somebody experience with Apache Lucene / Solr or Elasticsearch?"

I've been working on a RAG with Solr, and quickly hit some of the issues you describe when dealing with real-world messy data and user input, e.g. using all-MiniLM-L6-v2 and cosine similarity, "Can you summarize Immanuel Kant's biography?" matched a chunk containing just the word "Biography" rather than one which started "Immanuel Kant, born in 1724...", and "How high is Ben Nevis?" matched a chunk of text about someone called Benjamin rather than a chunk about mountains containing the words "Ben Nevis" and its height[0]. Switching embedding model has helped, but still not convinced that vector search alone is the silver bullet some claim it is. Still lots more to try though, e.g. hybrid search[1], query expansion[2], knowledge graphs etc.

[0] https://www.michael-lewis.com/posts/vector-search-and-retrie...

[1] https://sease.io/2023/12/hybrid-search-with-apache-solr.html

[2] https://news.ycombinator.com/item?id=38706913

bodantogat · on Dec 20, 2023

Exactly in the same place as you with Elastic Search (8.11). Went down the vector path to get better matches for adjectives, verbs and negations ( "room with no skylight" vs. "room with skylights" & "room with a large skylight"). Different dataset obviously, but I think I get slightly better results than your examples and it might be worth looking for a different sentence transformer (I tried a few and settled on roberta-base-nli-stsb-mean-tokens).

cyanydeez · on Dec 21, 2023

was reading through open llama, looks like way to get pertinent results is via different ranking algorithm and score based on convergence. then shove that back into the LLM

hobofan · on Dec 20, 2023

If you know that your search queries will be actual questions (like in the example you listed), you can possibly use the HyDE[0] to create a hypothetical answer which will usually have an embedding that's closer to the RAG chunks you are looking for.

It has the downside that an LLM (rather than just a embedding model) is used in the query path, but it has helped me multiple times in the past to strongly reduce problems with RAG like the ones you outlined, where it likes to latch onto individual words.

[0]: https://arxiv.org/abs/2212.10496

m-i-l · on Dec 20, 2023

Thanks, sounds interesting, not-dissimilar from some of the query expansion techniques. But in my case (open source, zero budget) I'm doing (slow) CPU inference, so an LLM in the query chain isn't really viable. As it is there is a near-instant "Source: [url]" returned by the vector search, followed by the LLM-generated "answer" (quite some time) later. So I think next steps will be "traditional" techniques such as query re-ranking and hybrid search, in line with the original "Build a search engine, not a vector DB" article.

bryanrasmussen · on Dec 20, 2023

Lucene supports decompounding and stemming, https://core.ac.uk/reader/154370300 depending on the language decompounding can be very important or of little import, Germanic languages should probably have decompounding.

toasted-subs · on Dec 20, 2023

I wonder what the advantage/disadvantages of dedicated search tools like lucent instead of custom LLC.

mnd999 · on Dec 20, 2023

Neo4j are mixing vector embeddings with knowledge graphs - https://neo4j.com/generativeai/

WhitneyLand · on Dec 20, 2023

Ignoring the disclosure etiquette here, then making an irrelevant rebuttal about relevance when the point was disclosure, then getting snarky with the person who tried to helpfully point it out?

I have no opinion on your products or your post, but some % of people steer away from companies for such things.

hobofan · on Dec 20, 2023

It's generally good etiquette around here to disclose your affiliation if you post comments that advertise the products of your employer.

mnd999 · on Dec 20, 2023

My views are my own and as such I do not disclose my employment or otherwise on here.

I did think twice about posting it, as I don't usually but it's relevant and i might be helpful so why not? If you don't like it, thanks for the downvote.

geoduck14 · on Dec 21, 2023

Wow. I learned some stuff about etiquette on HN today.

I'll support you, mnd999. I don't work for a graph dB company. We don't use graph dBs, but I'm considering it. Graph dbs are a legitimate source to feed data I to your RAG system. Our RAG system currently used hybrid search: lexical and semantic. We need to expand our sources, too. I would like to see us use LLMs to rephrase our content (we have a lot of code), and index on that. I think we should build a KG on content quality (we have millions of docs) and software out the things no one likes.

I also think a KG on "learning journeys" would be valuable, but really difficult.

softwaredoug · on Dec 20, 2023

I feel like we're passing the peak of a vector db hype cycle, where its increasingly clear its one retrieval strategy next to full-text search strategies. I constantly talk to people trying to build RAG and they realize they need a full-text search solution, and a number of strategies, VERY dependent on the task you want your chat system to accomplish.

It's important we get through the trough of disillusionment quickly. There's a lot of market education needed to know when they're truly needed.

themanmaran · on Dec 20, 2023

I fell into this trap as well. Started pretty hyped about vector dbs as the "magical crtl+f". Realized I needed some keyword matching as well. And also some transforms to get the right format for vector search. And also multiple chunking strategies for more fidelity search.

A month in I realize I'm trying to reinvent a search engine. Kinda wonder if I should have just used something like elasticsearch instead.

Linell · on Dec 20, 2023

It's worth saying that ES can work as a vector store itself, so it's very easy to handle a couple different kinds of searches this way.

finikytou · on Dec 20, 2023

full text search is also overhyped. at the end you querying a KB just like in the 90s. the major difference is the scale of the model and the fact that he can make assumptions with a tone that would make you believe what is he saying is a fact

softwaredoug · on Dec 20, 2023

I disagree it’s “overhyped”. I feel like there’s a fairly correct understanding in the market of its uses and limitations. That hype cycle occurred decades ago

avereveard · on Dec 20, 2023

Agree fully, vector search in embedding space is insufficient if you are working wirh a single document domain (i.e. They are all fish restaurant menu) and then the only thing that can save you is text search. Just make sure the underlying database supports synonyms lists and normalization in the languages you plan using.

About the "bad news" section.

You can do that today by just asking the llm using the ReAct pattern. Give it the database schema, a few shots prompt, and will happily decide to build query, read titles, and do more query if the titles aren't relevant enough, and fetch the content of titles that are relevant and use those to form an opinion.

This may not sem fast, but there are 7b token models that can do it today, at 150+token/second.

VivaLaPanda · on Dec 20, 2023

I think a model could do some basic eval but there are too many hidden assumptions for it to do especially well.

barrenko · on Dec 20, 2023

please elaborate, thanks.

avereveard · on Dec 20, 2023

this is an example: https://platform.openai.com/playground/p/HpFda4ZRXjbbanBwG35...

it's a ReAct loop with search and retrieve action, where I'm simulating the tool by hand. in prod, you'd pick up the output of the Action, run the callback with the LLM input, get the result, and pass the result as 'Observation:' - for the sake of this demo, I'm doing exactly that but manually copy pasting out of wikipedia

works more or less with any backend, and the llm is smart enough to change direction if a search doesn't produce relevant result (and you can see it in the demo). here the loop is cut short because I was running manually, but you can see the important bits.

just implement a retrieve and search function to whatever data source you have, vector or full text, and a couple regex to extract actions and final answer.

pro tip use a expensive llm to run the react loop, and a cheaper llm to summarize articles content after retrieval and before putting it as an observation. ideally you'd want something like "this is a document {document} on this topic: {last_thought}, extract the information relevant to the user question: {question}" trough a cheap llm, so you have the least amount of token into the react loop.

bambax · on Dec 20, 2023

Many, many big companies don't see any value in search. They simply use the defaults, and when those defaults are abysmal (like in the case of Confluence for example), well... they just suffer through it in silence.

I have so far mostly failed in trying to explain 1/ why search matters and 2/ that not all "search" functionality are equal and that building good search is an art form.

marginalia_nu · on Dec 20, 2023

> I have so far mostly failed in trying to explain 1/ why search matters and 2/ that not all "search" functionality are equal and that building good search is an art form.

Yeah, it takes an absurd amount of tuning to make search work well. Given how poorly the average search field works in almost anything, it's fair to say this crucial step isn't happening.

I suspect a lot of organizations just don't have workflows that would tolerate someone spending a month tweaking search algorithm parameters. It doesn't look enough like work.

andai · on Dec 20, 2023

Doesn't look like work, yet tragically, incremental improvements to "frictionlessness" represent order-of-magnitude improvements to user experience.

marginalia_nu · on Dec 20, 2023

Oh yeah, it's definitely an organizational problem that's pretty widespread. I think it boils down to a general lack of trust, and a willingness to turn developers into a sort of assembly line workers.

PaulHoule · on Dec 20, 2023

I went through a phase where I spoke to people who develop numerous enterprise search engines (e.g. OpenText) out of about 20 interviews I think I found one that did actual evaluation work on their search engine. The rest of them figured it was more important to have 300+ 'integrations' to various data sources and didn't think the relevance of the results was much of a selling point.

jsight · on Dec 20, 2023

Quality is harder to sell to enterprise customers when compared to feature lists. You have to check the right boxes and entertain the right ears to sell.

Being more useful than the others isn't as easy to quantify.

ankit219 · on Dec 20, 2023

I can relate. I have had conversations about enterprise search and how it can help them especially when done with the help of embeddings + LLMs, but many do not see it as a problem. It's a classic case of people you would be selling to have hired analysts for the use case, and do not see it as a prominent problem anymore. Employees would like better search, but not as much that they would go to CTOs and vouch for it.

You can use analogies like:

1. Imagine the world before Google. Web search was a pain. <<Search for your company>> would be similarly transformative.

2. Every company has an encyclopedia - the guy who knows about the past efforts and is consulted whenever people are trying something new. Search makes that redundant and reduce times.

3. Same with repetitive work because the employees cannot find where the work was done previously.

search is a feature, and unless you address the central pain point that search solves (in terms of revenue), no one will go for it. When you do, you will end up solving the second problem about how leaders never have the issue but employees do.

Grimblewald · on Dec 20, 2023

it may still not work, but try explaining using flashy analogies. For example, the internet without search algorithms is not the economic powerhouse we know it as today, and the quality of search made companies like google the giants they are. All this is because of the enormous economic impact good search has, say a user must make just 5 searches a day, but this turns into 20 because of poor search results, resulting in re-querying in an attempt to turn up the right result, multiply that wasted time by all employees and at face value you're costing yourself an enormous amount of money as a company, not to mention the compounding loss due to workflow interruption. With a graph or two you should be able to convince most of the fact good search = massive productivity gain.

Tomte · on Dec 20, 2023

"we have no stemming support in Confluence" goes far beyond unfortunate defaults.

marginalia_nu · on Dec 20, 2023

I didn't understand why Confluence's search engine works so poorly before I built my own search engine, and I especially don't understand why it works so poorly after. It's an absolute mystery and goes far beyond misconfiguration. Feels like they're just using a binary index and completely the skipping relevance ranking.

sgift · on Dec 20, 2023

Which is the height of bullshit since Confluence uses Lucene internally, which obviously does support stemming (at least it didn't. Luckily, I haven't had to use Confluence for ages). Confluence search is what happens when some dev gets told "hey, add search, we need to mark a checkbox", searches for 30s for "Java search lib" and just adds Lucene without knowing anything about it.

bambax · on Dec 20, 2023

JIRA gets a lot of bad press but it works ok. Confluence is an utter PoS with nothing going for it, nothing working the way it should or the way a random user would expect them to work.

How it survives (thrives) on the marketplace is a mystery.

dumbfounder · on Dec 20, 2023

Good luck! I exited the search game because I felt it was a race to the bottom. Elastic was super successful, and has basically made search a commodity, but it's a shitty quality commodity. Developers just throw the data in and call it a day. Relevance is the hard part, and always has been, otherwise we would all still be using AltaVista and Inktomi. LLMs are changing the game though, and real innovation is now happening in search. I want back in.

bioxept · on Dec 20, 2023

It seems to me that the buzz-word "vector db" leads to people not fully understanding what it's actually about and how it even relates with LLMs. Vector databases or nearest neighbor algorithms (as they were called before) were already in use for lots of other tasks not related to language processing. If you look at them from that perspective, you will naturally think of vector dbs as just another way of doing plain old search. I hope we get some more advancements in hybrid search. Most of the times, search is the limiting factor when doing RAG.

james-revisoai · on Dec 20, 2023

Good points... In many ways, before LLMs, vectors were getting so exciting, Sentence Transformers and BERT embeddings felt so instrumental, so powerful... work by the txtai author (especially things like semantic walking) felt incredible and like the next evolution. It's a shame in a way that all the creative and brilliant uses of text embeddings from similarity embeddings didn't really have any time to shine or go into product before ChatGPT made so much except search use cases obsolete..

Der_Einzige · on Dec 20, 2023

Btw - I published a paper at EMNLP with the txtai author (David) about using semantic graphs for automatic creation of debate cases!

https://aclanthology.org/2023.newsum-1.10/

Happy to see that David's excellent work is getting the love that it deserves!

dmezzetti · on Dec 21, 2023

Thanks for the nice words on txtai. There have been times this year I've thought about an alternate 2023 where the focus wasn't LLMs and RAG.

ChatGPT certainly set the tone for the year. Though I will say you haven't heard the last of semantic graphs, semantic paths and some of that work that did happen in late 2022 right before ChatGPT. A bit of a detour? Yes. Perhaps the combination is something that will lead to features even more interesting - time will tell.

charcircuit · on Dec 20, 2023

>It's a shame in a way that all the creative and brilliant uses of text embeddings from similarity embeddings didn't really have any time to shine or go into product before ChatGPT

Yes, it did. Companies that offer competitive search or recommendation feeds were all using these text models in production.

james-revisoai · on Dec 20, 2023

I was running one of them, and entering kaggle competitions throughout 2021 and 2022 using them. Many efforts and uses of Sentence-transformers (and new PhD projects) were thrown in the trash with Instruct GPT models and ChatGPT. I mean it's like developing a much better bicycle (lets say an ebike) but then cars come out. It was like that.

The future looked incredibly creative with cross-encoders, things like semantic paths, using the latent space to classify - everything was exciting. A all-in-one LLM that eclipsed embeddings on all but speed for these things was a bit of a kill joy.

Companies that changed existing indexing to use sentence transformers aren't exactly innovating; that process happened once or twice a decade for the last few decades. This was parents point I believe, in a way. And tbh, the improvement in results has never been noticeable to me; exact match is actually 90% of the solution to retrieval(maybe not search) already - we just take it for granted because we are so used to it.

I fully believe in a world without GPT-3, HN demos would be full of sentence transformer and other cool technology being used for demos and in creative ways, compared to how rarely you see them.

Der_Einzige · on Dec 20, 2023

Also, people seem to have forgotten that the whole technique behind sentence transformers (pooling embeddings) works as a form of "medium term" memory in-between "long term" (vectorDB retrieval) and "short term" (the prompt).

You can compress a large N number of token embeddings into a smaller N number of token embeddings with some loss of information using pooling techniques like what was in sentence transformers.

But I've literally gotten into fights here on HN with people who claimed that "if this was so easy people would be doing it" and other BS. The reality is that LLMs and embedding techniques are still massively undetooled. For another example, why can't I average pool tokens in ChatGPT, such that I could ask "What is the definition of {apple|orange}". This is notably easy to do in Stable Diffusion land and also even works in LLMs - despite that even "greats" in our field will go and fight me in the comments when I post this[1] again and again, desperately trying to get a properly good programmer to implement it for production use cases...

[1] https://gist.github.com/Hellisotherpeople/45c619ee22aac6865c...

wahnfrieden · on Dec 20, 2023

Share use cases?

charcircuit · on Dec 20, 2023

>Many efforts and uses of Sentence-transformers (and new PhD projects) were thrown in the trash with Instruct GPT models and ChatGPT.

There still exists a need for fast and cheap models where LLMs do not make sense.

deckar01 · on Dec 20, 2023

Instead of embedding the user prompt, I let the LLM invert it into keywords and search the embedding of that. It very much does feel like a magic bullet.

danielbln · on Dec 20, 2023

Using the LLM to mutate the user query is the way to go. A common practice for example to take the chat history of a chat, and rephrase a follow up question that might not have a lot of information density (e.g. follow up question is "and then what?" which is useless for search, but the LLM turns it into "after a contract cancellation, what steps have to be taken afterwards" or something similar, which provides a lot more meat to search with.

Using the LLM to mutate the input so it can be used better for search is a path that works very well (ignoring added latency and cost).

kristiandupont · on Dec 20, 2023

"Search the embedding"? Could you elaborate on this, it sounds interesting!

CGamesPlay · on Dec 20, 2023

I think OP means to filter the user input through an LLM with “convert this question into a keyword list” and then calculating the embedding of the LLM’s output (instead of calculating the embedding of the user input directly). The “search the embedding” is the normal vector DB part.

m-i-l · on Dec 20, 2023

"Query expansion"[0] has been an information retrieval technique for a while, but using LLMs to help with query expansion is fairly new and promising, e.g. "Query Expansion by Prompting Large Language Models"[1], and "Query2doc: Query Expansion with Large Language Models"[2].

[0] https://en.wikipedia.org/wiki/Query_expansion

[1] https://arxiv.org/abs/2305.03653

[2] https://arxiv.org/abs/2303.07678

sroussey · on Dec 20, 2023

Ask the LLM to summarize the question, then take an embedding of that.

I think you can do the same with data you store… summarize it to same number of tokens, then get an embedding for that to save with the original text.

Test! Different combinations of summarizing LLM and embedding generation LLM can get different results. But once you decide, you are locked in the summarizer as much as the embedding generator.

Not sure is this is what the parent meant though.

bayesian_limit · on Dec 28, 2023

I could not help but notice the Contriever curve is so much higher on y-axis Recall than the other methods (figure 11 in https://arxiv.org/pdf/2307.03172.pdf).

Has anyone come across more recent experiments, results, or papers related to this? I'm acquainted with the: - Contriever 2021 paper https://aclanthology.org/2021.eacl-main.74.pdf - Hyde 2022 https://arxiv.org/pdf/2212.10496.pdf

My suspicion is some pre-logic such as is the user's question dense enough then use Hyde with chat history. If anyone has more recent experience with Contrievers, would love to learn more about it!

Feel free to contact me directly on LinkedIn. https://www.linkedin.com/in/christybergman/

sroussey · on Dec 20, 2023

BTW: I think of this like asking someone to put things into their own words, and then it’s easier for them to remember. Matching on your way of talking can be weird from the LLM’s point of view, so use their point of view!

deckar01 · on Dec 20, 2023

It is two different language models. The embedding model tries to capture too many irrelevant aspects of the prompt that ends up putting it close to seemingly random documents. Inverting the question into the LLM’s blind guess and distilling it down to keywords causes the embedding to be very sparse and specific. A popular strategy has been to invert the documents into questions during initial embedding, but I think that is a performance hack that still suffers from sentence prompts being bad vector indexes.

sroussey · on Dec 20, 2023

You can use llama2 to do embedding and summaries and chat.

Turning the docs into questions is something I will test on stuff (just learning and getting a feel).

I am intrigued... what makes a good vector index??

deckar01 · on Dec 20, 2023

My heuristic is how much noise is in the closest vectors. Even if the top k matches seem good, if the following noise has practically identical distance scores, it is going to fail a lot in practice. Ideally you could calculate some constant threshold so that everything closer is relevant and everything further is irrelevant.

sroussey · on Dec 22, 2023

Apologies for being naive, but how do you calculate noise?

poulpy123 · on Dec 20, 2023

I'm sure it would be possible to fine tune a LLM like mistral to search a database or a document

politelemon · on Dec 20, 2023

> you could have a language model construct a query that includes a date filter.

But be careful because the output is not guaranteed. Which means you have to take care to provide the schema and what you're trying to do within the context window, and validate the output. There is a non-trivial overhead to this.

VivaLaPanda · on Dec 20, 2023

OAI function calling can solve this more or less

lysecret · on Dec 20, 2023

Couldn't agree more. To give an example, to go beyond a simple "generic" search.

I have a company finding buyers for commercial real estate. One of the search features are the locations of the buyers (usually family offices etc, always companies they have headquarters, preferences on where to buy etc.). You can then for example calculate the distance to those locations.

LLMs are extremely useful in creating these features from unstructured info on the companies. But just throwing an embedding on this and hoping it works doesn't.

However, embeddings work super well in the parts of the search.

dmezzetti · on Dec 20, 2023

I agree that RAG doesn't have to be paired with vector search. Other types of search can work in some cases.

Where vector search excels is that it can encode a complex question as a vector and does a good job bringing back the top n results. Its not impossible to do some of this with keyword search (term expansion, stopwords and so forth). Vector search just makes it easy.

In the end, yes this is a better search system. And thinking about this step is a good point. I would go a step further and say it's also worth thinking about the RAG framework. Lots of examples use a OpenAI/Langchain/Chroma stack. But it's also worth evaluating RAG framework options. There might be frameworks that are easier to integrate and perform better for your use case.

Disclaimer: I am the author of txtai: https://github.com/neuml/txtai

summarity · on Dec 20, 2023

I have a related project here: https://findsight.ai and also gave a talk about building it here: https://youtu.be/elNrRU12xRc

ravetcofx · on Dec 20, 2023

I'd love to have a search engine for all of my different conversations I've ever had with people through various messaging apps, that combines email and my scanned documents through paperless-ngx and any other PDFs or documents in my nextcloud in a single search interface

vasco · on Dec 20, 2023

Maybe at some point the NSA will let us download them all!

sampriti026 · on Dec 21, 2023

if someone has to build this locally to fetch discussion where x topic was discussed or find a person who had shown interest in certain x thing, how does one go about it?

One way of doing it is to embed messages with the added context of previous messages until the topic changes, otherwise, a simple similarity search of user prompt embedding would output messages of irrelevant topics since the context was included from the start.

Then embed the user prompt and perform a similarity search of either the user's query or create a hypothetical statement based on the prompt, also called HyDe approach. You ask an LLM to generate a hypothetical response given the query and then use its vector along with the query vector to enhance search quality.

For example, if the user query is - "find me who is interested in playing Minecraft on Tuesday", the llm will generate a response "I play Minecraft on Tuesdays" and we can search the vector of the llm output in the vector db which is all the messages along with their context.

However, I am not sure how this will work in scenarios where the user has sent a message asking "Will you play Minecraft on Tuesday", and person A has responded with "Yes". how can we have the model find person A? Shall we make a summary of each person based on the conversation with the user?

Also, the whole process might be computationally slow. how do we enhance the speed and performance?

(a noob here who wanted to build a similar solution)

worldsayshi · on Dec 20, 2023

I guess it could be a reality if GDPR came with a decent API spec do you could request your personal data algorithmically.

gdiamos · on Dec 20, 2023

From the article: "The crux is that while vector search is better along some axes than traditional search, it's not magic. Just like regular search, you'll end up with irrelevant or missing documents in your results."

RAG is often helpful and easy to add, but it's fundamentally search - not magic.

I find it helpful to look at the search results before feeding them into the model. Just like the "I'm feeling lucky" button on google doesn't always give the perfect answer. You may have to tweak your search query to improve the result.

jrussbowman · on Dec 20, 2023

I just used postgres to build my search engine and it also helps with the last 2 questions. Keeping the content context consistent helps with the first. Unscatter.com for example is content shared only in the last 30 days. Helps with keeping my operating costs under $50 a month too.

I wish I had time to mess with it more. Job and life has taken over. My first goal with AI would be to use it to for key word and phrase extraction and also analyzing all the links I pull in hourly to see if there is a larger story I could make visible.

kristiandupont · on Dec 20, 2023

I'm trying to alleviate the issue with tagging (https://kristiandupont.medium.com/empathy-articulated-750a66...), but it's not a panacea.

I feel that a big part of the solution will simply be in the form of increased speeds. If you can ask the model for a strategy and then let it search/process a few times in a loop, responses will improve vastly.

pryelluw · on Dec 20, 2023

I joke that is akin to applying taxonomy on a live tv interview. You need to tag and categorize but may only do so with precision after a point is made.

My current solution is to have an nlp pipeline that does so as tokens are returned. Not quite as precise yet but shows promise.

Should be open source sooner rather than later.

kristiandupont · on Dec 20, 2023

I like that analogy.

shouche · on Dec 21, 2023

I have been using elastic index for a while now. The best way I have found is to use a hybrid search - match all with embedding + exact+fuzzy match combination as a way to boost results.

Reranking also provide a significant improvement to the response quality.

Another way to improve results for domain specific RAG systems is to use some heuristics to boost results. E.g., penalize results that contain certain negative keywords or boost results with certain patterns.

For RAG, given the limited context size and potential hallucinations, best prompt + best data will provide you with best response.

Prompts can be improved greatly to get the LLM to throw a good response with reduced hallucinations. A lot of techniques are seen on Twitter and can be explored to find a good fit.

I improve my prompts using a GPT assistant that significantly improve the response quality. https://chat.openai.com/g/g-haH111AXX-prompt-optimizer

d4rkp4ttern · on Dec 20, 2023

This resonates with the approach we’ve taken in Langroid (the Multi-Agent framework from ex-CMU/UW-Madison researchers): our DocChatAgent uses a combination of lexical and semantic retrieval, reranking and relevance extraction to improve precision and recall:

https://github.com/langroid/langroid/blob/main/langroid/agen...

mariarmestre · on Dec 20, 2023

I think a fundamental issue with search, and the reason why many companies do not invest in tuning a good search experience, is that the main metric usually is to minimise embarrassing/irrelevant results, rather than get the best possible set of results. How can you even know what is the best answer to your query? Systematic evaluation is very hard.

TimPC · on Dec 20, 2023

If you control the browser your results are in you can monitor clicks and time spent on document to generate pretty good signal. If someone opens a document and looks at it for fifteen minutes you should be fairly convinced it was useful.

howmayiannoyyou · on Dec 20, 2023

OpenAI's ability to search and evaluate Bing results seems to me the best of both world's if it can be applied to custom data. By way of example, if an AI can query MacOS Spotlight and eval results I think the issue is resolved.

Xenoamorphous · on Dec 20, 2023

How do RAG implementations usually get around the context size limitations in LLMs?

Since it usually deals with PDFs and other docs that can be quite big, do they take only the first N tokens? Are abstractive summarisation techniques used?

svaha1728 · on Dec 20, 2023

They split the document. Here’s an example of Markdown splitting. All this is far more an art than science at this point.

https://python.langchain.com/docs/modules/data_connection/do...

leetrout · on Dec 20, 2023

RAG is retrieval-augmented generation. I had never heard of this before.

cgeier · on Dec 20, 2023

RAG seems to be Retrieval Augmented Generation

codingjaguar · on Dec 21, 2023

Partially agree.

Vector DBs are critical components in retrieval systems. What most applications need are retrieval systems, rather than building blocks of retrieval systems. That doesn't mean the building blocks are not important.

As someone working on vector DB, I find many users struggling in building their own retrieval systems with building blocks such as embedding service (openai,cohere), logic orchestration framework (langchain/llamaindex) and vector databases, some even with reranker models. Putting them together is not as easy as it looks. A fairly changeling system work. Letting alone quality tuning and devops.

The struggle is no surprise to me, as tech companies who are experts on this (google,meta) all have dedicated teams working on retrieval system alone, making tons of optimizations and develop a whole feedback loop of evaluating and improving the quality. Most developers don't get access to such resource.

No one size fits all. I think there shall exist a service that democratize AI-powered retrieval, in simple words the know-how of using embedding+vectordb and a bunch of tricks to achieve SOTA retrieval quality.

With this idea I built a Retrieval-as-a-service solution, and here is its demo:

https://github.com/milvus-io/bootcamp/blob/master/bootcamp/R...

Or using it in LlamaIndex:

https://github.com/run-llama/llama_index/blob/main/docs/exam...

Curious to learn your thoughts.

codingjaguar · on Jan 3, 2024

Here is an article that systematically discusses how vector retrieval and BM25 affects the search quality, in another word, what kind of systems are the past, now and future:

https://thenewstack.io/the-transformative-fusion-of-probabil...