Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

A cartoon:

To form a coherent idea you need to coordinate a lot of tokens. In other words, ideas are long-distance correlations between tokens. Ideas are the long-wavelength features of streams of tokens.

Is it exactly right? No. But as a cartoon it can motivate exploring an idea like this.



Right. This makes sense. But why Fourier space in particular. Why not, for example, a wavelet transform.


> Why not, for example, a wavelet transform.

That is a great idea for a paper. Work on it, write it up and please be sure to put my name down as a co-author ;-)


Or for that matter, a transform that's learned from the data :) A neural net for the transform itself!


That would be super cool if it works! I’ve also wondered the same thing about activation functions. Why not let the algorithm learn the activation function?


This idea exists (the broad field is called neural architecture search), although you have to parameterize it somehow to allow gradient descent to happen.

Here are examples:

https://arxiv.org/abs/2009.04759

https://arxiv.org/abs/1906.09529


Mostly because of computational efficiency irrc, the non linearity doesn’t seem to have much impact, so picking one that’s fast is a more efficient use of limited computational resources.


Now you’re talking efficiency—-certainly a wavelet transform may also work. But wavelets tend to be more localized than FTs.


This way you end up with time dilated convolutional networks [1].

[1] https://openreview.net/pdf?id=rk8wKk-R-


I like this. Anything that connects new synapses in my skull via analogy is a good post.


This is really a very interesting way of visualizing it.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: