A cartoon: To form a coherent idea you need to coordinate a lot of tokens. In ot...

spot5010 · 2025-02-26T22:38:42 1740609522

Right. This makes sense. But why Fourier space in particular. Why not, for example, a wavelet transform.

1024core · 2025-02-27T00:22:21 1740615741

> Why not, for example, a wavelet transform.

That is a great idea for a paper. Work on it, write it up and please be sure to put my name down as a co-author ;-)

monkfish328 · 2025-02-27T02:26:06 1740623166

Or for that matter, a transform that's learned from the data :) A neural net for the transform itself!

spot5010 · 2025-02-27T13:24:33 1740662673

That would be super cool if it works! I’ve also wondered the same thing about activation functions. Why not let the algorithm learn the activation function?

porridgeraisin · 2025-02-27T16:12:21 1740672741

This idea exists (the broad field is called neural architecture search), although you have to parameterize it somehow to allow gradient descent to happen.

Here are examples:

https://arxiv.org/abs/2009.04759

https://arxiv.org/abs/1906.09529

FuckButtons · 2025-02-27T16:38:04 1740674284

Mostly because of computational efficiency irrc, the non linearity doesn’t seem to have much impact, so picking one that’s fast is a more efficient use of limited computational resources.

evanb · 2025-02-27T00:30:56 1740616256

Now you’re talking efficiency—-certainly a wavelet transform may also work. But wavelets tend to be more localized than FTs.

thesz · 2025-02-27T07:18:27 1740640707

This way you end up with time dilated convolutional networks [1].

[1] https://openreview.net/pdf?id=rk8wKk-R-

kridsdale1 · 2025-02-26T21:29:13 1740605353

I like this. Anything that connects new synapses in my skull via analogy is a good post.

3abiton · 2025-02-27T01:03:15 1740618195

This is really a very interesting way of visualizing it.