SMT Solving on an iPhone

virtuallynathan · on Nov 6, 2018

> The i7-7700K is a desktop CPU drawing at least 45 watts of power and clocked at 4.5 GHz when running a single-threaded workload; in contrast, the iPhone was unplugged, probably doesn’t draw 10% of that power, and runs (we believe) somewhere in the 2 GHz range

A12 runs at 2.5Ghz, and draws ~3.64W of power. I also think the actual power draw of the i7-7700K may be north of 45W, Anandtech claims 90W (with a TDP of 91W).

I bet if you could pop a heatsink on the A12(X) and overclock it/up the TDP, it would come very close or beat intel at most single-threaded applications.

Source: https://www.anandtech.com/show/13392/the-iphone-xs-xs-max-re...

sidereal · on Nov 6, 2018

I cleaned up that text a little bit -- the 7700K doesn't draw the full 91W TDP when only a single core is loaded, as in this experiment.

yazr · on Nov 5, 2018

a. single threaded code

b. extended benchmark duration - no thermal throttling encountered

c. straight forward C compiler - no special code for the GPU unit or whatever

d. main reason is probably memory and cache configuration

I think it is time to consider A(RM) and Intel cores at par. Qualcomm cores maybe X% slower but they will close the gap.

It is now a question of system trade-offs, as designers jangle memory sizes, cache sizes, cores, cooling, vector extensions, etc

This is a really pivotal event.

And dont forget AMD challenging Intel on like-for-like cores.

Its pretty amazing that noone has put together a real ARM alternative for the server side.

happyopossum · on Nov 5, 2018

> I think it is time to consider A(RM) and Intel cores at par. Qualcomm cores maybe X% slower but they will close the gap.

I think you're vastly underestimating the value of X in the above. The latest Qualcomm cores weren't even competitive with last year's A11, and the A12 leaves them in the dust.

[1] https://www.anandtech.com/show/13392/the-iphone-xs-xs-max-re...

floatboth · on Nov 5, 2018

> noone has put together a real ARM alternative for the server side

Cavium (now under Marvell) and Ampere both did. Huawei/HiSilicon are also apparently coming.

(Also Qualcomm, but they quit for mysterious reasons. "Want to focus on mobile" or something.)

But these are all very high end solutions. For the low end, there's Marvell's Armada8k (MACCHIATObin)… and kind of an empty void in the middle :(

marmaduke · on Nov 6, 2018

Cavium’s is a 96 core x 4 node in a 2U config. Sure it’s high end, but it’s what you need to compete with similar compute dense rack servers from eg Dell.

floatboth · on Nov 6, 2018

The original ThunderX was up to 96 cores in 2 sockets, but single core perfomance was unimpressive (comparable to Cortex-A72).

ThunderX2 is 64 cores in 2 sockets, with 4-way SMT, up to 3GHz turbo, and the cores are way better.

// It's great that they're competing with dense rack servers, I just wish the low end to mid range market wasn't ignored :(

marmaduke · on Nov 6, 2018

Yep I guess it’s a ramp up issue. Easier to have bare metal hard ware at high end and then slice it up into cheap servers.

ekr · on Nov 5, 2018

The consequences of that are big. This will lead to commoditization of IP cores, cpus, and this will lead to a race to the bottom with the likes of Allwinner, Amlogic, Rockchip, NXP offering products competitive with Intel, Amd, Apple, Samsung. As the margins will go to near-0, companies like Intel will suffer the most.

This is assuming that semiconductor process improvements will significantly slow down or completely stop.

makapuf · on Nov 5, 2018

Agreed this can lead to interesting times again in the cpu space. I guess this can lead to two scenarios : either there are possible improvements that are possible but were not necessary for intel to unleash to be dominant, thus leading to significant improvements (in whichever direction, parallelism, power, cache, price) and compute will be better for the same price thanks to competition or investments won't be worth the perf gains and while prices will go down, performance will stagnate since there won't be much incentive to pay ten times the price for 10% improvements (random BS numbers).

krackers · on Nov 5, 2018

Where would risc-v fit into this? I'd assume it'd help accelerate the commoditization, maybe not in the server space at first but definitely for IoT and embedded applications.

aquamo · on Nov 5, 2018

Thanks for this - with the release of the A12X Bionic, I've wanted to do something similar - i.e. build some specialized tools to leverage that CPU for atypical / non-mobile use cases.

I wish Apple would get on with it and include Terminal.App and the FreeBSD user land and Xcode for iOS so we can start running a shell and doing interesting things with this new hardware :-)

_ph_ · on Nov 5, 2018

Yes, as a previous-gen iPad Pro owner, the biggest regret with it is, that I cannot run even some very basic Unix programs on this great hardware. A more open iPad - even if the whole Unix userland would be contained to a sandbox - would be a dream.

wingerlang · on Nov 6, 2018

If you're on a jailbreakable iOS version you can do this

epicureanideal · on Nov 5, 2018

It seems Apple has some really good profit margins on it's hardware. So, in theory nothing is stopping System76 or a new competitor from introducing high quality iPad-like Unix systems.

myrandomcomment · on Nov 6, 2018

...Except lack of design team to do all the custom design work that Apple does. The systems you talk about are assembled from off the shelf parts.

ahakki · on Nov 6, 2018

Except that System76 or who ever would need to be capable of building iPad-like hardware. I really don‘t see that, at least not a a competitive price point.

slededit · on Nov 5, 2018

The common refrain I hear from hardware guys is that the uncore is the hardest part by far. The bulk of design time is spent connecting multiple cores together. Great single thread perf doesn't imply an easy path to multicore.

macintux · on Nov 5, 2018

Still, the A-series multicore benchmarks have been pretty impressive as well.

slededit · on Nov 5, 2018

The problems arise when you start getting into double digit core counts. At 4 you can make pretty reasonable MUXes still. Above that you need to start making serious trade-offs to either Fmax or latency.

sevensor · on Nov 6, 2018

I'd definitely believe that cache plays a big role -- while I don't know nearly as much as the author about how SMT solvers work, I've seen how much Z3 bogs down when two instances are running in parallel. That's quite consistent with the cache explanation.

tluyben2 · on Nov 6, 2018

I really like the new Apple hardware ; it's fast enough to do most things on (like this example), but there are no real serious production apps for developers. I have been building my own tooling for iPad but it's really not something that can be released. I do use the iPad as daily driver for most things, but writing code on it is still not very feasible. There is no way (as far as I know) to 'catch' actual crashing processes, for instance if you make a boo boo when coding something with UIKit (check Continuous editor for iPad for an example of something that will be crashing completely all the time when/if you try anything serious on it), the entire app will crash and that's not really workable. If it is possible to actual prevent that, it would be possible to make an actual interpreter with a debugger and make a programming environment on it.

But for experiments and in-house tooling, it works very well, and I barely have to touch my laptop ever because I have enough tooling to do most things I need to do.

Small things are; more developer freedom (allowing fuller control; probably will happen very very slowly) and better keyboard (for the iPad Pro).

tluyben2 · on Nov 6, 2018

EDIT: by hardware I mean to say the mobile hardware, not the laptops.

cosmic_ape · on Nov 5, 2018

Exciting. And the cache size is likely the main reason. But that can not be concluded from the analysis.

>>VTune agrees, and says that Z3 spends a lot of time waiting on memory while iterating through watched literals.

But how much is "a lot" is not specified. And the propagation also allocates memory sometimes, to keep the learned clauses. Not sure how Z3 manages this, but couldn't it be that the mallocs are just slower on that desktop, so an OS issue?

BuckRogers · on Nov 6, 2018

It's worth keeping in mind that Intel tied architecture to process node, and have been stuck on 14nm for many years. Very little changes have been made to Skylake (2015) by Intel even with the latest 9th gen release. They're starting to look long in the tooth on many fronts, AMD's SMT is more efficient, and AMD has a much better boost algorithm as well (Precision Boost 2), among other aspects.

Had their roadmap not tied node to arch, this comparison wouldn't look so damning. Also, many engineers left Intel for Apple because they're talented and tired of releasing Skylake every year. Once Intel fully decouples node and arch, we should see non-mobile CPUs running away from mobile chips like the A12.

This benchmark is essentially Intel's 2015 design on 14nm vs Apple's 2018 design on 7nm.

mcphage · on Nov 6, 2018

> This benchmark is essentially Intel's 2015 design on 14nm vs Apple's 2018 design on 7nm.

Except it's what Intel is selling in 2018.

BuckRogers · on Nov 6, 2018

I'm not trying to offend anyone who feels compelled to be naturally defensive or take some sort of "side" here.

Nor am I defending Intel (I recently chose and now use a Ryzen 2700X), I'm just contributing additional facts and context to the conversation for anyone who wasn't aware of it since I read all the comments here and no one mentioned it.

jSully24 · on Nov 5, 2018

Could Apple's next "innovation" not be a consumer product but to own the chip world? The iPhone "just" being the beta test for the chips?

jimbokun · on Nov 5, 2018

They probably don't want to sell those chips for Android to keep their competitive advantage, and the PC market is probably too small to be worth the investment compared to the iOS market.

makapuf · on Nov 5, 2018

Intel is currently worth 1/5 of Apple 1 Trillion. I guess they may want to grow by 20%, but not at the price of exclusivity oof the 1bn device market chip exclusivity - however I'm not sure anyone has an iphone because it's the best benchmark performing phone. What's next, overclocking, liquid-chilled phones ? (after looking at it for fun, yes, it does seem to exist, just look up liquid cooling gaming phone ).

ben-schaaf · on Nov 5, 2018

I'd bet that if a mobile chip manufacturer unlocked their chip for overclocking, someone is going to put it under LN2.

ksec · on Nov 6, 2018

That 20% includes Fab, which is more than half of Intel's value. Apple partnering with TSMC meant even if they could get all of Intel's market it would still only growth of 10%. Apple would be much better if they spend time designing chips for their own macOS and sell more Macs.

floatboth · on Nov 5, 2018

Server market. Make it a return of the Xserve :)

rlorenzo · on Nov 5, 2018

Seems like Mac Minis are the new Xserve: https://www.macstadium.com/

Maybe the next upgrade of the Mac Mini in 4-5 years will be to A series of chips.

samfisher83 · on Nov 6, 2018

I think this probably has a little to do with AMD. AMD wasn't there to push intel and they introduced incremental upgrades for a while. I remember when athlon64 was introduced it really kicked intel into gear. At this point it might be that there might be less you can do to improve performance since we might be hitting semiconductor limits, but I think a lack of competition probably didn't help/