Hacker News new | past | comments | ask | show | jobs | submit login

I'm sorry, I don't understand what you mean. I checked the original article again too. As it stands, my understanding is you are claiming:

- blowing on a GPU (which I take to mean doing roughly nothing)

- gets roughly the same perf change

- as moving from fp16 to q4






Are you referring to the finetuning part?

The multiple bug fixes are separate from the finetuning sections - Unsloth itself makes finetuning 2x faster and use 70% less memory - the bug fixes are totally detached from finetuning - ie you can take the fixed version we uploaded at https://huggingface.co/unsloth/phi-4, and use it in any framework or inference engine.

Apologies I'm confused on the comment sorry.

If you're questioning the credibility of the bug fixes - we fixed 8 bugs in Gemma https://x.com/danielhanchen/status/1765446273661075609, multiple bugs in Llama, Mistral, Qwen, a gradient accumulation bug https://x.com/danielhanchen/status/1846235913443262891 and much more


2x faster than what?

Oh 2x faster and uses >70% less memory than Hugging Face + Flash Attention 2! I did a CUDA / GPU Mode talk about it here: https://www.youtube.com/watch?v=hfb_AIhDYnA Also to the PyTorch team here: https://www.youtube.com/watch?v=MQwryfkydc0 and the PyTorch Conference here: https://www.youtube.com/watch?v=PdtKkc5jB4g

Update - the Phi-4 team is working on adding all our fixes to the original model! https://huggingface.co/microsoft/phi-4/discussions/21



Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: