I agree that the problem is hard. However, biological brain is able to handle it quite "easily" ( is not really easy - bilions of iterations were needed ). The current brains are solving this 3D physical world _only_ via perception.
So this is place were we must look. It starts with the sensing and the integration of that sensing. I am working at this problem since more than 10 years and I came to some results. I am not a real scientist but a true engineer and I am looking from that perspective quite intesely: The question that one must ask is: how do you define the outside physical world from the perspective of a biological sensing "device" ? what exactly are we "seeing" or "hearing"? So yes, working on that brought it further in defining the physical world.
I do agree with you. We have an natural eye (what you call a 'biological brain') automat that inconsciouly 'feels' the structure of a geometric of the places we enter to.
Once this layer of "natural eye automat" is programmed behind a camera, it will spit out this crude geometry : the Spacial-data-bulk (SDB). This SDB is small data.
From now on, our programs will only do reason, not on data froms camera(s) but only on this small SBD.
==> And now the LLMs, to feel Spacial knowledge, will have a very reduce dataset. This will make spacial data reasoning very less intencive than we can't imagine.
Maybe a brute force solution would work just like it did for text. I would not be surprised if the scale of that brute force was not within reach yet though.
So this is place were we must look. It starts with the sensing and the integration of that sensing. I am working at this problem since more than 10 years and I came to some results. I am not a real scientist but a true engineer and I am looking from that perspective quite intesely: The question that one must ask is: how do you define the outside physical world from the perspective of a biological sensing "device" ? what exactly are we "seeing" or "hearing"? So yes, working on that brought it further in defining the physical world.