Hacker News new | past | comments | ask | show | jobs | submit login

It doesnt memorize anything. It just needs gazillion parameters that approach the size of the training set to finesse its conversational accent.



LLama2 has a 5TB training set.


So? You just support my point. That is a factor of 100-1000 versus model parameter count, assuming that the training set has no redundancy whatsoever. Hence more likely a factor of 10-100.

People dont want to acknowledge that the LLM structure reflects rather closely what it is being trained on, but the incredibly large number of parameters suggests it is closer to a photographic fit than a true abstraction. larger models being more likely to memorize training data (Carlini et al., 2021, 2022)

The fact that the information gets mangled and somewhat compressed doesnt change this close relationship.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: