Hacker News new | past | comments | ask | show | jobs | submit login

The LLM embeddings for a token cover much more than semantics. There is a reason a single token embedding dimension is so large.

You are conflating the embedding layer in an LLM and an embedding model for semantic search.




I don't think we're using the term semantic in the same way. I mean "relating to meaning in language."


The embedding layer in an llm deals with much more than the meaning. It has to capture syntax, grammar, morphology, style and sentiment cues, phonetic and orthographic relationships and 500 other things that humans can't even reason about but exist in words combinations.


I'll give you that. I was including those in "semantic space," but the distinction is fair.

My original point still stands: the space you've described cannot capture a full image of human cognition.




Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: