BERT is already outdated, but still useful as you need only 1 Titan RTX to retra... | Hacker News

Hacker News new | past | comments | ask | show | jobs | submit

login

bitL on Nov 10, 2019 | parent | context | favorite | on: Deep learning has a size problem

BERT is already outdated, but still useful as you need only 1 Titan RTX to retrain its BERT_large model via transfer learning.

turnersr on Nov 10, 2019 [–]

What methods make BERT outdated? Do you have pointers to other options?

bitL on Nov 10, 2019 | [–]

e.g. XLNet:

https://arxiv.org/abs/1906.08237

phreeza on Nov 10, 2019 | | [–]

XLnet is Bert with a bunch of additional training tricks.

bitL on Nov 10, 2019 | | [–]

BERT is a Transformer with a bunch of additional training tricks. Transformer is self-attention with a bunch of additional training tricks...

Consider applying for YC's Spring batch! Applications are open till Feb 11.
Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact