You say they are distinguishable. How would you experimentally distinguish two systems, one of which "goes through personal experience" and therefore is doing "real reasoning", vs one which is "sifting through tokens and associating statistical parameters"? Can you define a way to discriminate between these two situations?