In fact, one of the chief advantages of the BERT/Transformer architecture over E...

kyle_grove on Nov 10, 2019 | parent | context | favorite | on: Deep learning has a size problem

In fact, one of the chief advantages of the BERT/Transformer architecture over ELMO/LSTM is the ability to parallelize.