Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

There is something off with

> ...and despite supporting it [AVX512] on consumer CPUs for several generations...

I dunno. Before Rocket Lake (11th gen) AVX512 was only available in those enthusiast cpu, xeon cpu or in some mobile processors (which i wouldn't really call consumer cpu).

With the 12th gen (and that performance/efficiency core concept), they disabled it after a few months in those core and it was never seen again.

I am pretty sure tho, after AMD has some kind of success with AVX512 Intel will reintroduce it again.

btw. I am still rocking an Intel i9-11900 cpu in my setup here. ;)



That tracks. Intel's updated AVX10 whitepaper[1] from a few months back seems to confirm this. It explicitly states 512-bit AVX will be standard for both P and E cores, moving away from 256-bit only configs. This strongly implies AVX-512 is making a proper comeback, not just on servers but future consumer CPUs with E-cores too. Probably trying to catch up with AMD's wider AVX-512 adoption.

[1] - https://cdrdv2.intel.com/v1/dl/getContent/784343 (PDF)


Nice find! I really hope so, as AVX512 is something interesting to play around with and i am pretty sure, it will play a bigger role in the future, especially with AI and all that stuff around it!


> With the 12th gen (and that performance/efficiency core concept), they disabled it after a few months in those core and it was never seen again.

The 12th gen CPUs with performance cores didn't even advertise AVX512 support or have it enabled out of the box. They didn't include AVX512 on the efficiency cores for space reasons, so the entire CPU was considered to not have AVX512.

It was only through a quirk of some BIOS options that you could disable the efficiency cores and enable AVX512 on the remaining CPU. You had to give up the E-cores as a tradeoff.


IMO this is more a sign of complete dysfunction at Intel. They definitely could have made avx512 instructions trigger a switch to p-cores, and honestly probably could have supported them completely (if slightly slowly) by splitting the same way AMD does on Zen4 and Zen5 C cores. The fact that they shipped P cores and E cores that had different assembly sets is what you get when you have separate teams competing with each other rather than working together to make a good product.


> They definitely could have made avx512 instructions trigger a switch to p-cores

Not really, no. OS-level schedulers are complicated as is with only P vs E cores to worry about, let alone having to dynamically move tasks because they used a CPU feature (and then moving them back after they don't need them anymore).

> and honestly probably could have supported them completely by splitting the same way AMD does on Zen4 and Zen5 C cores.

The issue with AVX512 is not (just) that you need a very wide vector unit, but mostly that you need an incredibly large register file: you go up from 16 * 256 bit = 4096 bits (AVX2) to 32 * 512 bit = 16384 bits (AVX512), and on top of that you need to add a whole bunch of extra registers for renaming purposes.


> The issue with AVX512 is not (just) that you need a very wide vector unit, but mostly that you need an incredibly large register file

Not necessarily, you need to behave as if you had that many registers, but IMO it would be way better if the E cores had supported avx512, but half of the registers actually didn't exist and just were in the L2 cache.


Also Zen4C has AVX512 support while being only ~35% bigger than Gracemont (although TSMC node advantage means you should possibly add another 10% or so). This isn't really a fair comparison because Zen4c is a very differently optimized core than Intel's E cores, but I do think it shows that AVX-512 can be implemented with a reasonable footprint.

Or if Intel really didn't want to do that, they needed to get AVX-10 ready for 2020 rather than going back and forth on it fore ~8 years.


They could enable it on P cores with a separate enablement check and then leave it up to the developer to schedule their code on a P core. I imagine Linux has some API to do that scheduling (because macOS does), not sure about Windows.


So introduce performance and efficiency profiles for threads at the OS level. Why should CPUs have to be heterogeneous with regard to the ISA and other details?


You don't need to switch the entire cores. You could have E cores borrow just the AVX512 circuitry from the p cores.


> They definitely could have made avx512 instructions trigger a switch to p-cores,

That'd be an OS thing.

This is a problem that has been solved in the mainframe / supercomputing world and which was discussed in the BSD world a quarter of a century ago. It's simple, really.

Each CPU offers a list of supported features (cpuctl identify), and the scheduler keeps track of whether a program advertises use of certain features. If it does want features that some CPUs don't support, that process can't be scheduled on those CPUs.

I remember thinking about this way back when dual Nintendo cartridge Pentium motherboards came out. To experiment, I ran a Pentium III and a Celery on an adapter card, which, like talking about self hosting email, infuriated people who told me it can't be done. Different clock speeds, different CPU features, et cetera, worked, and worked well enough to wonder what scheduler changes would make using those different features work properly.


fixed :) thx


Awesome! I always love to read about such stuff, especially if it includes playing around with AVX512!




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: