Possibly, but it's not my job to research the evidence for your claims. Can you ...

mjburgess · on July 29, 2023

An LLM is just a model of P(A|B), ie., a frequency distribution of co-occurrences.

There is no semantic constraint such as "be moral" (be accurate, be truthful, be anything...). Immoral phrases, of course, have a non-zero probability.

From the sentence, "I love my teacher, they're really helping me out. But my girlfriend is being annoying though, she's too young for me."

can be derived, say, "My teacher loves me, but I'm too young..." which is non-zero probable on almost any substantive corpus

Groxx · on July 29, 2023

Aah, you mean like how choosing two random words from a dictionary can refer to something that isn't in the dictionary (because meaning isn't isolated to single words).

Yeah, that seems unavoidable. Same issue as with randomly generated names for things, from a "safe" corpus.

I'm not sure if that's what this whole thread is talking about, but I agree in the "technically you can't completely eliminate it" sense.

lolinder · on July 29, 2023

The original claim was that they can produce those robustly, though. Yes, the chances will be non-zero, but that doesn't mean it will be common or high fidelity.

mjburgess · on July 29, 2023

Ah, then let me rephrase, it's actually this model:

> P(A|B,C,D,E,F....)

And with clever choices of B,C,D.... you can make A abitarily probable.

Eg., Suppose, 'lolita' were rare, well then choose: B=Library, C=Author, D=1955, E=...

Where, note, each of those is innocent.

And since LLMs, like all ML, is a statistical trick -- strange choices here will reveal the illusion. Eg., suppose there was a magazine in 1973 which was digitized in the training data, and suppose it had a review of the book lolita. Then maybe via strange phrases in that magazine we "condition our way to it".

A prompt is, roughly, just a subsetting operation on the historical corpus -- with clevery crafted prompts you can find the page of the book you're looking for.