> We build a description of the codebase including the file tree and parsed function names and class names
This sounds like RAG and also that you’re building an index? Did you just mean that you’re not using vector search over embeddings for the retrieval part, or have I missed something fundamental here?
I'm currently working on a demonstration/POC system using my ElasticSearch as my content source, generating embeddings from that content, and passing them to my local LLM.
It would be cool to be talking to other people about the RAG systems they’re building. I’m working in a silo at the moment, and pretty sure that I’m reinventing a lot of techniques
I didn't mean to be down on it, and I'm really glad it's working well! If you start to reach the limits of what you can achieve with your current approach, there are lots of cute tricks you can steal from RAG, eg nothing stopping you doing a fuzzy keyword search for interesting-looking identifiers on larger codebases rather than giving the LLM the whole thing in-prompt, for example
This sounds like RAG and also that you’re building an index? Did you just mean that you’re not using vector search over embeddings for the retrieval part, or have I missed something fundamental here?