Hacker News new | past | comments | ask | show | jobs | submit login
Cores That Don't Count [pdf] (2021) (sigops.org)
106 points by signa11 3 months ago | hide | past | favorite | 13 comments



This is about unstable cores that randomly output incorrect calculation and ways to mitigate it via better hardware testing and duplicating parts of the core that can fail often.

I did however thought initially from the title that it's about 1-bit CPUs like the MC14500B Industrial Control Unit (ICU) which is a CMOS one-bit microprocessor designed by Motorola for simple control applications in 1977. It completely lacks an ALU so essentially cannot count, but is designed for PLCs.


Hey. It could count to 1, which is something.


Unrelated to the topic being discussed, but my mind immediately went to "per core pricing" which is common for databases. Some SQL servers would be charged for by the number of CPU cores in a system, and manufacturers would often offer an SKU with fewer, faster cores to compensate for this.

Taking that thought and thinking about adding "silent" cores is interesting to me. What if your CPU core is actually backed by multiple cores instead to get the "fastest" speed possible? For example imagine if you had say 2 CPU cores that appeared as one and each core would guess the opposite branch of the other (branch prediction) so that it was "right" more of the time.

An interesting thought that had never occurred to me. It's horribly inefficient but for constrained cases where peak performance is all that matters, I wonder if this style of thought would help. ("Competitive Code Execution"?)


People have thought about it, but it’s so incredibly wasteful that it’s impractical. At 20% branching, you rapidly run out of resources pending the winning branch and spend possibly 8 cores just to predict three branches ahead, or roughly 15 instructions. That’s pretty rough!


I wonder if you could put more logic units per core and load balance to prevent thermal throttling, or if you’d make the communication pathways slower at a rate that exceeds the gains.


Yep, you can do that, and yep, it gets slower.

That’s basically the tradeoff Apple made with their M series chips vs AMD/Intel which until recently have been chasing fast and narrow designs. Apple in contrast, has a crazy “wide” core aka it can issue and retire many more instructions per clock than basically any other mainstream CPU.


In distributed computing, a few layers of abstraction up, an analogous technique of sending two identical RPCs to distinct backends can be used to reduce tail latency.


For example imagine if you had say 2 CPU cores that appeared as one and each core would guess the opposite branch of the other (branch prediction) so that it was "right" more of the time.

I belive some CPUs do speculate down both paths of branches if the branch predictor was really uncertain which one to take.


Not exactly the same thing, but I remember talking with a co-worker before about strategies to use a core and a hyperthreaded sibling core on the same work load, to get speed up.

However, in practice I think it would be really difficult to prevent them just trashing each others cache / using resources.


Yeah your options are to spin on a few lines of cache (e.g. an iterated function or processing a ring buffer) or streaming cache ops


[2021]



From which there's one significant prior discussion, 3 years ago, 72 comments:

<https://news.ycombinator.com/item?id=27378624>




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: