Fascinating. The article repeatedly makes the claim that “LLMs work by predictin...

Terr_ · 2025-02-10T20:25:11 1739219111

This occurs because of ambiguous language which conflates the LLM algorithm with the training-data and the derived weights.

The mysterious part involves whatever patterns might naturally exist within bazillions of human documents, and what partial/compressed patterns might exist within the weights the LLM generates (on training) and then later uses.

Analogy: We built a probe that travels to an alien planet, mines out crystal deposits, and projects light through those fragments to show unexpected pictures of the planet's past. We know exactly how our part of the machine works, and we know the chemical composition of the crystals, but...

ctbergstrom · 2025-02-12T06:12:15 1739340735

I very much like this analogy. Thank you for making this clearer in my mind.

mhast · 2025-02-11T13:45:23 1739281523

These system work by taking a list of tokens (basically words) and the "model" and send that to a function which returns one new token.

You add that new token to the list of tokens.

Repeat with the new list of tokens.

That's how these systems work. We don't know exactly how the model works (wrt input tokens) but even that is a simplification. It's not magic. Just maths that's too complex to understand trivially.

yencabulator · 2025-02-13T21:40:36 1739482836

For inference, we have hand crank that rotates a lot of gears, with a final gear making one token (word) appear in a slot. For learning, we even know how to feed a bunch of text into a complicated thing that tells us what gears to connect to each other and how. We have no idea why the gear ratios and placements are what they are.

habinero · 2025-02-10T01:15:20 1739150120

We...do know how they work?

Workaccount2 · 2025-02-10T15:01:01 1739199661

We know how they work in that we built the framework, we don't know how they work in that we cannot decode what is "grown" on that framework during training.

If we completely knew how they worked we could go inside an explain exactly why every token generated was generated. Right now that is not possible to do, as the paths the tokens take through the layers tend to be outright nonsensical when observed.

abecedarius · 2025-02-10T15:29:34 1739201374

We know how they're trained. We know the architecture in broad strokes (amounting to a few bits out of billions, albeit important bits). Some researchers try to understand the workings and have very very far to go.