Oh hey! :) TLDR naively gradient accumulation was over-weighting short sequence ...

ejddhbrbrrnrn · 2024-10-18T12:17:05 1729253825

Is this a general issue rather than unsloth specific. How wide is this problem? Sounds wild if it has been affecting everyones training.

danielhanchen · 2024-10-18T18:32:01 1729276321

Unfortunately it's not an Unsloth issue but a general issue affecting nearly all trainers which use grad accum. We worked with Huggingface so their trainers should be fixed now though in the main branch