One thing I was wondering about regarding transformers, which perhaps someone mo...

svachalek · 2024-10-22T20:48:58 1729630138

I'm not really in this space but I like to read the papers and as I understand it, the typical dimensionality is far higher than 2. For example in the original "All you need is attention" paper, the example they give has 64 dimensions. They're vectors so even though they might be drawn as a matrix, each value represents a distance in a different dimension.

amelius · 2024-10-22T22:53:16 1729637596

I'm talking about the matrices W_Q, W_K and W_V. My question is why these are matrices (tensors of dimension 2) and not tensors of a higher dimension than 2.

My thinking goes like: a matrix can represent a graph (each entry may correspond to an edge between two nodes), but e.g. a 3-dimensional tensor may correspond to a hypergraph where each entry is a 3-hyperedge, so you can not just talk about the relation between two tokens, but also about the relation between three tokens (in language this could be e.g. subject, object and indirect-object/dative).