Hacker News new | past | comments | ask | show | jobs | submit login

My understanding is that in NTK aware RoPE scaling, the model does pay uniform attention. With older methods, not as much.



Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: