By far the most interesting aspect of this, for me, is that we're now seeing tools for building software infrastructure with layers of APIs that operate on -- gasp! -- natural language, which is notoriously prone to imprecision and ambiguity. And yet it works remarkably well. It's hard not to look at all this, mouth agape, in awe.
Part of me wonders, though:
Wouldn't it be better if we could compose LLMs by passing sequences of embeddings (e.g., in a standardized high-dimensional space), which are much richer representations of LLM input, internal, and output states?
I think embeddings are actually low-'resolution' representations. Like GPT's ability to parse and calculate the structure of the sentence "Hello, how are you?" is not represented in the embedding for the sentence. The embedding is a 1-dimensional vector and inside the model it interacts with other texts in 10,000+ dimensions
I wonder if there'd be any use for a "ontological" representation, somewhere in-between a natural-language string and its embedding in a particular LLM. Maybe something that balances human-readability, LLM-composability, lack of brittleness, insight into the local structure of the embedding, etc.
I wonder too. I imagine the best we could do with present technology is to get back the generated text in the form of text tokens accompanied by their corresponding deep embeddings (last hidden states): `[(text_token, deep_emb), (text_token, deep_emb), ...]`. Those deep embeddings incorporate "everything the model knows" about each generated token of text.
Maybe a mapping/representation for a "medium embedding" could be learned that strikes a balance between shallow and deep. I have no idea what a good objective-function would be, though.
I mean deep embeddings (i.e., sequences of hidden states, the ones are computed by all those interactions) , not the shallow embeddings of token ids in the first layer of the model! Those deep embeddings are much richer representations.
Imagine if you and others building apps had access to "GPT3 deep sequence embeddings v1.0" via an API.
Not quite. My understanding is that OpenAI's various embeddings APIs return only a single vector per document, instead of the sequence of hidden states corresponding to each predicted next token in the response generated by a GPT-type LLM.
Imagine getting generated text from a GPT LLM that comes with a deep embedding of each generated token's "contextual meaning":
by which measure are you making this claim? even a 95% reliability means you get 5% wrong. on top of that you have prompt injection attacks. this stuff is much less suitable the more you move away from demos to predictable business applications
Whoa, I didn't say this is suitable for predictable business applications yet!
What I did say is that I'm in awe at the fact that this stuff works as well as it does, given that natural language is so notoriously prone to imprecision and ambiguity. I mean, if you had told me six months ago that this would be working even "95%" of the time in demos, I would have said, no way.
Basically, I agree with you that at present this becomes "less suitable the more you move away from demos to predictable business applications" :-)
Perhaps, but language is the common denominator in a multi-model world. E.g., I pass the GPT output into other models which are fine tuned for that sub domain. You can do embedding to embedding conversion, but not sure it's worth the effort.
Imagine if OpenAI made GPT3's final hidden states available via an API ("GPT3 deep sequence embeddings v1.0"), next to each generated text token: [(text_token, deep_emb), (text_token, deep_emb), ...]. You and anyone else could build apps on top. Those hidden states would incorporate much more, and much richer, information than the text. Higher-level models could be trained to act on such information!
By far the most interesting aspect of this, for me, is that we're now seeing tools for building software infrastructure with layers of APIs that operate on -- gasp! -- natural language, which is notoriously prone to imprecision and ambiguity. And yet it works remarkably well. It's hard not to look at all this, mouth agape, in awe.
Part of me wonders, though:
Wouldn't it be better if we could compose LLMs by passing sequences of embeddings (e.g., in a standardized high-dimensional space), which are much richer representations of LLM input, internal, and output states?