Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

If you look at how the chiplets are organized, you technically have 4 cores sharing a bank of L3 (2 of these 4 core groups per chiplet). In the 48 core model, 1 core from each 4 core group are disabled, so you have 3 cores sharing the same quantity of L3. So you now have 25% more L3 cache per core. You also have 25% more per-core memory and PCIE bandwidth.

If your workload is cache or memory bandwidth sensitive you might recover some performance despite having 25% fewer cores. You can probably run fewer cores at a higher sustained clockspeed. This may reduce a 25% deficit to something more modest like 5-10%, at which point the 64 core parts are harder to justify.



Not to mention that web workloads are frequently memory-bandwidth-sensitive. I remember Google published a paper where they measured CPU usage in production environments and at least one of their real-world applications spent like 30% of its time in memcpy/strcpy. (The paper examined ways to optimize those copies by carefully applying non-temporal hints in the event that the destination buffer wasn't going to be used for a while).

Given that, having more memory bandwidth per-core seems like it could easily improve CF's performance a lot.


33% more per core?


Ah right, I'm bad at math.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: