Right now the solution is vector databases; however we could envision a different state representation in the transformer decoder which is the main component of a GPT; for example, you could summarize your architecture and tests and implementation with compressed / smaller vectors for each piece and organize that stuff in a tree structure. Then just concatenate the tree to the context and user query. It’d require you to rewrite the multi head attention function or make a wrapper, and it’d add an ETL step to create the tree, but then you could have that whole compressed representation of your codebase available when you ask a question. It would necessarily be an abstraction and not verbatim copy of the code, otherwise you’d run out of room. Funny how everything runs into Kolmogorov complexity eventually