Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

AVX512 benefits are coming from gather-scatter instructions, I think.

What is interesting here is that in their current implementation they aren't very beneficial [1] and [2].

[1] https://arxiv.org/pdf/1806.05713.pdf [2] https://www.sciencedirect.com/topics/computer-science/scatte... (recommends these instructions to be used outside of main loop)

I remember vaguely that first implementations of scatter/gather instructions were not faster than sequential access from different memory registers.

And, thusly, it may come handly that AMD has much bigger core count because each thread will have less memory to access.



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: