Hacker News new | past | comments | ask | show | jobs | submit | alexamadoriml's comments login

That is definitely the next thing I would try :) mostly the reason why I started with a BiLSTM is that it's much easier to implement/debug, also afaik the time complexity of RNNs with respect to the sequence length is O(N) but it's O(N^2) for attentional models like a Transformer. Although it probably doesn't matter much on the scale of the SST-2 dataset.


Ah, you're the author. I missed that. Cool work by the way.


Thanks!


Afaik the more you tighten a bottleneck the more accuracy you lose, and much faster than you gain interpretability. My guess is that such abstractions would require very powerful "priors" (as in, knowledge stored in the network as opposed to being stored in the representation) that humans gained with evolution and that today's models don't possess.


Join us for AI Startup School this June 16-17 in San Francisco!

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: