BERT is bidirectional. How do you use that for language generation?

visarga · on May 5, 2019

It will generate words for every [UNK] in its input sequence.

gwern · on May 6, 2019

It'll generate one token because it's trained to predict one missing UNK, as I understood it. What is the scaffolding? Do you generate random sentences and iterate repeatedly? And how does that get you whole coherent paragraphs? (Has anyone demonstrated that this actually works with BERT?)

speedplane · on May 10, 2019

BERT can pretty easily be used to generate text. It's intended to be used as a base model and fine-tuned with an additional model on top. The fine-tuning model could then be trained to generate sentences with the underlying language model powered by BERT.