*Very cool.* By far the most interesting aspect of this, for me, is that we're n...

firasd · on Jan 18, 2023

I think embeddings are actually low-'resolution' representations. Like GPT's ability to parse and calculate the structure of the sentence "Hello, how are you?" is not represented in the embedding for the sentence. The embedding is a 1-dimensional vector and inside the model it interacts with other texts in 10,000+ dimensions

neuronexmachina · on Jan 18, 2023

I wonder if there'd be any use for a "ontological" representation, somewhere in-between a natural-language string and its embedding in a particular LLM. Maybe something that balances human-readability, LLM-composability, lack of brittleness, insight into the local structure of the embedding, etc.

cs702 · on Jan 18, 2023

I wonder too. I imagine the best we could do with present technology is to get back the generated text in the form of text tokens accompanied by their corresponding deep embeddings (last hidden states): `[(text_token, deep_emb), (text_token, deep_emb), ...]`. Those deep embeddings incorporate "everything the model knows" about each generated token of text.

neuronexmachina · on Jan 18, 2023

Maybe a mapping/representation for a "medium embedding" could be learned that strikes a balance between shallow and deep. I have no idea what a good objective-function would be, though.

cs702 · on Jan 18, 2023

I mean deep embeddings (i.e., sequences of hidden states, the ones are computed by all those interactions) , not the shallow embeddings of token ids in the first layer of the model! Those deep embeddings are much richer representations.

Imagine if you and others building apps had access to "GPT3 deep sequence embeddings v1.0" via an API.

machiaweliczny · on Jan 18, 2023

It’t that precisely embeddings API from OpenAI? It has all context so very useful for search

cs702 · on Jan 18, 2023

Not quite. My understanding is that OpenAI's various embeddings APIs return only a single vector per document, instead of the sequence of hidden states corresponding to each predicted next token in the response generated by a GPT-type LLM.

Imagine getting generated text from a GPT LLM that comes with a deep embedding of each generated token's "contextual meaning":

  [(text_token, deep_emb), (text_token, deep_emb), ...]

allowing higher-level models and apps to use all the information in those rich representations as inputs.

swyx · on Jan 18, 2023

> And yet it works remarkably well.

by which measure are you making this claim? even a 95% reliability means you get 5% wrong. on top of that you have prompt injection attacks. this stuff is much less suitable the more you move away from demos to predictable business applications

cs702 · on Jan 18, 2023

Whoa, I didn't say this is suitable for predictable business applications yet!

What I did say is that I'm in awe at the fact that this stuff works as well as it does, given that natural language is so notoriously prone to imprecision and ambiguity. I mean, if you had told me six months ago that this would be working even "95%" of the time in demos, I would have said, no way.

Basically, I agree with you that at present this becomes "less suitable the more you move away from demos to predictable business applications" :-)

dandiep · on Jan 18, 2023

Perhaps, but language is the common denominator in a multi-model world. E.g., I pass the GPT output into other models which are fine tuned for that sub domain. You can do embedding to embedding conversion, but not sure it's worth the effort.

cs702 · on Jan 18, 2023

Imagine if OpenAI made GPT3's final hidden states available via an API ("GPT3 deep sequence embeddings v1.0"), next to each generated text token: [(text_token, deep_emb), (text_token, deep_emb), ...]. You and anyone else could build apps on top. Those hidden states would incorporate much more, and much richer, information than the text. Higher-level models could be trained to act on such information!