The problem is that the rate of progress over the past 5/10/15 years has not bee...

calf · 2024-09-04T19:04:16 1725476656

Could it be argued that transformers are only possible because of Moore's law and the amount of processing power that could do these computations in a reasonable time? How complex is the transformer network really, every lay explanation I've seen basically says it is about a kind of parallelized access to the input string. Which sounds like a hardware problem, because the algorithmic advances still need to run on reasonable hardware.

svnt · 2024-09-04T19:23:06 1725477786

Transformers in 2017 as the basis, but then the quantization-emergence link as a grad student project using spare time on ridiculously large A100 clusters in 2021/2022 is what finally brought about this present moment.

I feel it is fair to say that neither of these were natural extrapolations from prior successful models directly. There is no indication we are anywhere near another nonlinearity, if we even knew how to look for that.

Blind faith in extrapolation is a finance regime, not an engineering regime. Engineers encounter nonlinearities regularly. Financiers are used to compound interest.