Hacker News new | past | comments | ask | show | jobs | submit login

BERT is already outdated, but still useful as you need only 1 Titan RTX to retrain its BERT_large model via transfer learning.



What methods make BERT outdated? Do you have pointers to other options?



XLnet is Bert with a bunch of additional training tricks.


BERT is a Transformer with a bunch of additional training tricks. Transformer is self-attention with a bunch of additional training tricks...




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: