Hacker News new | past | comments | ask | show | jobs | submit login

there's a very widely used measure here in the academic community: instructions per cycle (ipc). IPC boils down to how many instructions can you actually complete per clock cycle once you account for memory, caching, etc.



IIRC, that's maybe a 16x improvement (32x if you count 32->64 bit). Which accounts for less than half of the (orders of magnitude of) improvement we should have got from Moore's law.

(More cores aren't a performance improvement; if you were willing to deal with non-serial execution, you could have just bought 32 Pentium Fours; putting them all on the same chip is convenient (and cheap), but as a price/performance improvement, it's all price, no performance.)


> (More cores aren't a performance improvement; if you were willing to deal with non-serial execution, you could have just bought 32 Pentium Fours; putting them all on the same chip is convenient (and cheap), but as a price/performance improvement, it's all price, no performance.)

That's only true if you only consider ALU throughput for performance, but in terms of real world performance, where the interconnect between cores and memory is hugely significant, a multicore processor has many advantages over a rack of otherwise equivalent single-core NUMA nodes.


I was thinking single thread though.

My guess is that there are now a lot of forms of hardware acceleration of specific things that make your daily experience seem faster, but I haven't seen them catalogued and put in perspective with measurements.


Single-threaded doesn’t imply IPC can’t go up or go above 1. See https://en.m.wikipedia.org/wiki/Superscalar_processor.

I would think that and cache size are large differences between a P4 and the i7 running at about the same clock frequency.


i7 can regularly archive 2-3ipc easily.

I haven't read about P4 and NetBurst in a long t ime, but is mu memory serve me right, P4 is usually less than 1ipc due to its very long pipeline (31 stages IIRC) that is very prone to pipeline stall. Modern processor can also do many thing faster. I think P4 takes ~110 cycles for integer division, while modern cpu can do in ~30-40 cycles. And IIRC, P4 cannot flush the division unit so if branch/jump prediction is wrong and division is speculated, it was to wait until the division unit finish computation before it can resume execution.




Consider applying for YC's Spring batch! Applications are open till Feb 11.

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: