It also highlights the main disadvantage of Transformers codebase using the copy-paste method for models, where this fix needs to be applied to every single model separately.
Unfortunately transformers is a general library for many models, and so there are tonnes of different architectures. Unfortunately copy paste and changing some parts of the arch is the only way feasible in the meantime.
This feels too complex to tackle with PyCharm structural find and replace, even a more powerful structural find and replace like https://comby.dev/ feels underpowered here.
Sourcegraph batch changes? That solves broadcasting the change but doesn’t help with capturing the change to make.
Open rewrite? The python implementation is early stages, not prod ready as I understand it. Plus this change is too complex to use refaster templates even if we could use orw so you’d be debugging a fairly involved method visitor which in this case is probably orders of magnitude more time consuming than just making the changes manually.
Ye a complete change was necessary for now - HF had to isolate the cross entropy loss and make another class for it, and it had to be applied to all model archs.
It also highlights the main disadvantage of Transformers codebase using the copy-paste method for models, where this fix needs to be applied to every single model separately.