Bias in the statistical sense is usually E[\hat beta - beta]. By which I mean th...

lambdaphagy · on Jan 3, 2021

Bias in prediction, rather than parameter estimation, is a perfectly well established sense of the term. In particular, people doing language modeling are practically never concerned with identifiability, because you can't pick out one weight out of a trillion parameter model and say what it ought to be in the limit of infinite data.

6gvONxR4sf7o · on Jan 3, 2021

But when people use the term bias in NLP, that’s what they’re talking about. They don’t want an aspect of the model to do something it ought not do. It’s a case of omitted variable bias causing things like the word analogy issues you hear about. Not an issue of bias in predicting the masked word.