Where in the loss function of LLM training is the relationship between their model of reality and their predicted tokens? Any internal model an LLM has is an emergent property of their underlying training.
(And, given the way instruct/chat models are finetuned, I would say convincing/persuasive is very much the direction they are biased)
(And, given the way instruct/chat models are finetuned, I would say convincing/persuasive is very much the direction they are biased)