So? You just support my point. That is a factor of 100-1000 versus model parameter count, assuming that the training set has no redundancy whatsoever. Hence more likely a factor of 10-100.
People dont want to acknowledge that the LLM structure reflects rather closely what it is being trained on, but the incredibly large number of parameters suggests it is closer to a photographic fit than a true abstraction. larger models being more likely to memorize training data (Carlini et al., 2021, 2022)
The fact that the information gets mangled and somewhat compressed doesnt change this close relationship.