I think that people suggest RAG, also because the models develop so fast that very probably the base model you finetune on will be obsolete in a year or so.
If we are approaching diminishing returns it makes more sense to finetune. As the recent advances seem to happen by throwing more compute to CoT etc maybe the time is close or has already come.
There are so many chain types it is easier to do the abbreviations. Basically extend a RAG to have a graph to influence how to either critisize itself or perform different actions. It has gotten to the point where there are libraries for define them. https://langchain-ai.github.io/langgraph/tutorials/introduct...
If we are approaching diminishing returns it makes more sense to finetune. As the recent advances seem to happen by throwing more compute to CoT etc maybe the time is close or has already come.