> It’s clear they do seem to construct models from which to derive responses. The problem is once you stray away from purely textual content, those models often get completely batshit
I think you mean that it can only intelligently converse in domains for which it's seen training data. Obviously the corpus of natural language it was trained on does not give it enough information to infer the spatial relationships of latitude and longitude.
I think this is important to clarify, because people might confuse your statement to mean that LLMs cannot process non-textual content, which is incorrect. In fact, adding multimodal training improves LLMs by orders of magnitude because the richer structure enables them to infer better relationships even in textual data:
I think you mean that it can only intelligently converse in domains for which it's seen training data. Obviously the corpus of natural language it was trained on does not give it enough information to infer the spatial relationships of latitude and longitude.
I think this is important to clarify, because people might confuse your statement to mean that LLMs cannot process non-textual content, which is incorrect. In fact, adding multimodal training improves LLMs by orders of magnitude because the richer structure enables them to infer better relationships even in textual data:
Multimodal Chain-of-Thought Reasoning in Language Models, https://arxiv.org/abs/2302.00923