The Future of Core, Intel GPUs, 10nm, and Hybrid x86

Symmetry · on Dec 12, 2018

>Q: In the demo of FOVEROS, the chip combined both big x86 cores built on the Core microarchitecture and the small x86 cores built on the Atom microarchitecture. Can we look forward to a future where the big and little cores have the same ISA?

>R: We are working on that. Do they have to have the same ISA? Ronak and the team are looking at that. However I think our goal here is to keep the software as simple as possible for developers and customers. It's a challenge that our architects have taken up to ensure products like this enter the market smoothly. We’ll also have a packaging discussion next year on products like this. The chip you see today, while it was designed primarily for a particular customer to begin with, it’s not a custom product, and in that sense will be available to other OEMs.

This seems like a big issue to me going foward. The Intel fuses off various ISA features on various architectures for market segmentation reasons or just doesn't put them in in the case of Atom means that they'll have a harder time of it if they want to pursue big.LITTLE.

monocasa · on Dec 12, 2018

Ironcially, Atom also gets ISA extensions that don't make their way into the bigger cores too, like the Intel SHA extensions. I think the idea was that the bigger cores were already smart enough to not really get a benefit from dedicated SHA instructions, but the did help the simpler Atom cores.

Of course that now means that the big cores arent a super set of the small cores, feature wise.

blattimwind · on Dec 12, 2018

Ryzen shipped those extensions and they improve SHA performance quite a lot.

stephencanon · on Dec 13, 2018

MOVBE is another example that appeared in Atom years before it was available in "big" cores.

vetinari · on Dec 12, 2018

SHA extension is on the way into Icelake, so they will be superset again.

NullPrefix · on Dec 12, 2018

Easiest way to solve would be moving the process to BIG core on first use of unsupported feature.

brigade · on Dec 12, 2018

That would still need kernel support, so apps will randomly crash if you aren't using the latest patches (spoiler: some people won't be)

Really, it's just a bad idea overall to have usermode visible microarchitectural differences between cores, as Samsung keeps demonstrating with their Exynos chips.

Symmetry · on Dec 12, 2018

I'm far from an expert here but it sounds like you'd have to keep track of CPU extensions on a per core and per thread basis and I don't think most OSs actually do that.

aidenn0 · on Dec 12, 2018

Or just catch an illegal instruction trap and move the thread to a big core if it's on a little core.

jandrese · on Dec 12, 2018

Or vice versa? The talk here was about how the specialized crypto hardware was only on the little core because it isn't necessary on the big one.

blattimwind · on Dec 12, 2018

The Atom encryption engine stuff isn't in the micro-arch though. Afaik you'd use it via kernel APIs.

whydid · on Dec 12, 2018

Security s/fixes/features/g coming in 2020. Does this mean HW fixes for spectre/meltdown?

Chyzwar · on Dec 12, 2018

Whiskey Lake already has some fixes for meltdown/foreshadow.

https://www.anandtech.com/show/13301/spectre-and-meltdown-in...

rurban · on Dec 13, 2018

Those "some fixes" are not enough. It needs a cache redesign, which will be too late. I also don't see a second C3, extra for the kernel.

sroussey · on Dec 12, 2018

Might be like the Secure Enclave on the Keystone RISK-V.

ksec · on Dec 13, 2018

I think this settles the debate whether Apple will be moving to its own ARM Mac.

AMD will need to move and execute its plan to perfection. Hopefully they gain enough market share to make a difference and making enough profits for long term survival. Finger crossed.

npunt · on Dec 13, 2018

My read is Apple is most definitely on for ARM Mac and this is too little, too late (I assume that's your reaction as well). Doesn't seem like much IPC increase and mostly just catch up on both features and process.

I'm sure these Intel chips will make their way to Macs whenever they ship (2019 or beyond), but its likely * Cove is the last new Intel arch Apple uses if they can release some A-series Macs by around ~2020. It seems like the only interesting IP Intel may have to Apple is Optane (theoretically), and the only value Intel offers Apple for more than the next 12mos is Xeons slated for Pro machines.

When you consider that the trend is integrating everything into SoCs, and their features are more tightly designed according to actual usage up the stack in OS and apps and use cases, and they're shipped on a strict yearly cadence, and their R&D benefits from a massive market of iDevice sales, and their value is inherently better captured via device sales and direct customer relationship... it's kind of nuts that Apple is still with Intel.

That's not even accounting for the 10nm delay, nor Intel's weird market segmentation of features, nor Intel's margins, nor Intel's clear organizational issue (per article) of decoupling IP & process that add risk to the roadmap.

Even if Intel gets to EUV before TSMC (they won't), it's just not structured right. This can't last much longer.

ksec · on Dec 14, 2018

Apple will still need to SoC from 5W ( MacBook ), 7W- 15W (MacBook Air), 25-45W ( MacBook Pro ), 65-95W ( iMac ), 95-200W ( iMac Pro and Mac Pro ) all these combined of only 20M unit per year.

I hope I am wrong, not sure if it make any financial sense.

npunt · on Dec 14, 2018

I might simplify that further into three segments with different silicon strategies:

Low (~15m units) - MacBooks except 15" MBP

Med (~2-4m units) - 15" MBP, iMac, and Mac mini

High (<1m units) - iMac Pro and Mac Pro

In LOW, silicon-wise its essentially free since they already have a SoC powerful enough w/A13X+. They can always bump up the wattage/frequency and/or decrease the chassis size to define a slightly new market segment where compute expectations are lower (e.g. the Air approach).

Between LOW & MED is the mobile workstation use case of the higher spec 15" (or larger) MBP. Cheapest solution: add a discrete GPU.

In MED, they could still use A13X+ and add discrete GPU at higher SKUs. Alternately, they could create a beefier chip ("A13XL"?) that would just bolt more cores onto the CPU/GPU.

Between MED & HIGH would be the semi-pro use case of the higher spec 27" iMac (or even certain use cases of Mac Mini). Hard to say what cheap solution is here beyond better GPU.

In HIGH, it feels like Intel is here to stay for a while. These use cases still really need the beefy Xeons and are dealing with lots of pro-level edge cases which favors staying with x86, ECC, etc, and cost-wise it's very low volume.

On these Pro machines where price is less of an object, and to keep up with the rapid feature development roadmap they get from A-series, it feels like they'd do the 'why not both' strategy and throw in an A13X+ as the primary processor and a Xeon as a co-processor with it's own RAM. Just like GPUs are co-processors today. The OS, everyday apps, and specific tasks that favor the A-series would run on the A13X+, while certain instructions and legacy x86 programs/VMs/etc would be routed to the Xeon.

---

The more I think about it the more this co-processor approach really makes sense for the next 5 years of Pro-level Macs, since it avoids the cost of creating new low-volume, high complexity silicon.

It's fun to speculate because it represents quite an interesting business + technical + market segmentation challenge. They don't want to siphon off or detract Mac users any further than they already have, but also need to start reaping the benefit of their highly competitive and high volume chips.

ksec · on Dec 14, 2018

1. You ignore the Tape Out cost on leading Edge node, with yield unforeseeable due to higher voltage and frequency usage.

2. Adding a Discreet GPU still requires a new set of drivers that is previously not written on ARM Mac.

The 5M Mac will require addtional hundred million of investment, time, and testing.

The Co-Processor approach also requires additional OS and software support. Testing cost also.

The only way I see is Apple employ a similar strategy to AMD, chiplet design, where a small CPU Core Die could be used for lots of different configuration.

We will have to see how EYPC 2 perform to judge whether it is a good solution.

npunt · on Dec 16, 2018

I considered chiplets but I probably dismissed them prematurely based on my understanding that they introduce some efficiency loss, and it would be un-Applelike to make their iDevice cash cow less efficient in order to serve the Mac niche. They do open up a ton of flexibility in using different nodes, configs, improved yields, etc. Seems they're actively researching various methods [1], and the whole semi field is moving that way, so I agree with you if they work well that's probably the way they'd build chips for MED and HIGH use cases. I revise my assessment after further thought :)

There's definitely software work that would need to make Intel chips co-processors, but I'm not sure it's a monumental task. There's already some of this happening today with T2 chips handling video encoding/decoding in certain apps, and of course discrete GPUs already act as co-processors with their own resources for many tasks, so this approach isn't unprecedented. In this co-pro idea I'm proposing, the OS is run by arm64, and depending on the binary they route to either x86 or arm64. Developers would have some control over that, and legacy x86 binaries would automatically run x86 on machines that had Intel chips.

Discrete GPU wouldn't require significant driver work if discrete GPUs were only available on machines that had Intel chips. That might be a good practical point to differentiate if the chiplet approach lets them design even higher powered GPUs for LOW and MED products.

Again I don't see this much as a cost-issue, but as a strategic issue. They can spread costs over 5 million Macs in multi-year transition, and this would be partially or fully offset by cheaper hardware & ability to ship differentiated hardware that better suits markets Apple wants to pursue. Unless costs are massive, it seems like not a hard thing to justify. Personally I just can't see x86 instruction set becoming more important in the next decade+ when already it's been eroded from mobile & server fronts and less important because of improved software tooling, and Intel is perpetually floundering. Feels like a bad bet.

The simplest path / null hypothesis to disprove is LOW/MED get A-series chips, HIGH get Intel + dGPU + T-series, and multiple versions of macOS & multiple binaries of apps are shipped. That's only acceptable for a few years I think due to higher maintenance cost. If the Intel co-processor approach is not difficult software-wise, that seems beneficial because it might simplify OS/kernel/system maintenance as that would now just run on A-series chips. Putting Intel in an easily-removable VM-like corner would give Apple more flexibility to ship future products that serve different market segments (e.g. the oddball Mac Mini), and potentially extend decision if/when to drop them.

The point about frequency/voltage is an interesting one. I figured there'd be at least 10-30% possible difference, accepting efficiency loss at higher frequencies (less an issue on larger chassis). At least in the past, A9 ran at 1.85ghz and A9X at 2.26ghz, but A10 v A10X and A12 v A12X [2] are equivalent, so perhaps you're right that on leading edge nodes they're just going to optimize for max efficient freq. Anyway, that's all single-thread CPU, and as A-series chips get closer to desktop in IPC, differentiator for larger machines is primarily more threads/ram/gpu.

Good chatting, thank you for insights.

[1] https://www.macrumors.com/2018/06/12/apple-3d-chip-packaging... [2] https://www.anandtech.com/show/13661/the-2018-apple-ipad-pro...

rurban · on Dec 13, 2018

So we'll have to wait until 2020(?) until we'll get a usable secure CPU, with a redesigned cache? Sorry, too late. By that time AMD already took over.

    Willow Cove 2020 ?	10 nm ?	
    Cache Redesign, New Transistor Optimization, Security Features