alexamadoriml's comments

alexamadoriml · on Nov 15, 2019

That is definitely the next thing I would try :) mostly the reason why I started with a BiLSTM is that it's much easier to implement/debug, also afaik the time complexity of RNNs with respect to the sequence length is O(N) but it's O(N^2) for attentional models like a Transformer. Although it probably doesn't matter much on the scale of the SST-2 dataset.

duaoebg · on Nov 15, 2019

Ah, you're the author. I missed that. Cool work by the way.

alexamadoriml · on Nov 16, 2019

Thanks!

alexamadoriml · on Nov 15, 2019

Afaik the more you tighten a bottleneck the more accuracy you lose, and much faster than you gain interpretability. My guess is that such abstractions would require very powerful "priors" (as in, knowledge stored in the network as opposed to being stored in the representation) that humans gained with evolution and that today's models don't possess.