Same issue described on HF: https://huggingface.co/blog/gradient_accumulation It...

danielhanchen · 2024-10-18T18:34:56 1729276496

Unfortunately transformers is a general library for many models, and so there are tonnes of different architectures. Unfortunately copy paste and changing some parts of the arch is the only way feasible in the meantime.

CraigJPerry · 2024-10-18T11:25:48 1729250748

>> disadvantage of Transformers codebase using the copy-paste method for models, where this fix needs to be applied to every single model separately

What are the best tools we have available for tackling this kind of large scale copy-paste change?

https://github.com/huggingface/transformers/pull/34191/commi...

This feels too complex to tackle with PyCharm structural find and replace, even a more powerful structural find and replace like https://comby.dev/ feels underpowered here.

Sourcegraph batch changes? That solves broadcasting the change but doesn’t help with capturing the change to make.

Open rewrite? The python implementation is early stages, not prod ready as I understand it. Plus this change is too complex to use refaster templates even if we could use orw so you’d be debugging a fairly involved method visitor which in this case is probably orders of magnitude more time consuming than just making the changes manually.

What else is there that I don’t know about?

danielhanchen · 2024-10-18T18:36:18 1729276578

Ye a complete change was necessary for now - HF had to isolate the cross entropy loss and make another class for it, and it had to be applied to all model archs.