Hacker News new | past | comments | ask | show | jobs | submit login

That's so vague I can't tell what you're suggesting. What specifically do you think needs solving at the model level? What should work differently?



There’s probably lack of cpabalities on multiple fronts. RAG might have the right general idea but currently the retrieval seems to be too seperated from the model itself. I don’t know how our brains do it, but retrieval looks to be more integrated there.

Models currently also have no way to update themselves with new info besides us putting data into their context window. They don’t learn after the initial training. It seems if they could, say, read documentation and internalize it, the need for RAG or even large context windows would decrease. Humans somehow are able to build understanding of extensive topics with what feels to be a much shorter context-window.


Don't forget the importance of data privacy. Updating a model with fresh information makes that information available to ALL users of that model. This often isn't what you want - you can run RAG against a user's private email to answer just their queries, without making that email "baked in" to the model.


You don't need to update the whole model for everyone. Fine tuning exists and is even available as a service in openai. The updates are only visible in the specific models you see.


Maintaining a fine-tuned model for every one of your users - even with techniques like LoRA - sounds complicated and expensive to me!


It is, but it's also not that bad. A copy of the weights is X GB of cloud storage, which can be stored as a diff if it helps, and added compute time for loading a custom model and unloading for the next customer. It's not free, but it's an approachable cost for a premium service.


I guess it's because people are not using tools enough yet. In my tests giving LLM access to tools for retrieval works much better then trying to guess what the RAG would need to answer. ie. LLM decides if it has all of the necessary information to answer the question. If not, let it search for it. If it still fails than let it search more :D


Agreed. Retrieval performance is very dependent on the quality of the search queries. Letting the LLM generate the search queries is much more reliable than just embedding the user input. Also, no retrieval system is going to return everything needed on the first try, so using a multi-step agent approach to retrieving information is the only way I've found to get extremely high accuracy.


The queries you see and the resulting user interaction should be trained into the embedding model.

This is a foundational problem that requires your data. The way you search Etsy is different than the way you search Amazon. The queries these systems see are different and so are the desired results.

Trying to solve the problem with pretrained models is not currently realistic.


Our brains aren't even doing it also. We can't memorise all the things in the World. For us a library/Google Search is what RAG is for an LLM.


I can answer questions off the cuff based on the weights of the neural network in my head. If I really wanted to get the right answers I would do "RAG" in the sense of looking up answers on the web or at the library and summarizing them.

For instance I have a policy that I try hard not to say anything like "most people think that..." without providing links because I work at an archive of public opinion data and if it gets out that one of our people was spouting false information about our domain, even if we weren't advertising the affiliation, that would look bad.


I think he is saying we should be making fine-tuning or otherwise similar model altering methods easier rather than messing with bolt-on solutions like RAG

Those are being worked on and RAG is the ducktape solution until they become available




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: