Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

M2's Neural Engine had 15TOPS, M3's 18TOPS (+20%) vs. M4's 38TOPS (+111%).

In transistor counts, M2 had 20BTr, M3 25BTr (+25%) and M4 has 28BTr (+12%).

M2 used TSMC N5P (138MTr/mm2), M3 used TSMC N3 (197MTr/mm2, +43%) and M4 uses TSMC N3E (215MTr/mm2, +9%).[1][2]

[1] https://en.wikipedia.org/wiki/5_nm_process#%225_nm%22_proces...

[2] https://en.wikipedia.org/wiki/3_nm_process#%223_nm%22_proces...



The fact that TSMC publishes their own metrics and target goals for each node makes it straightforward to compare the transistor density, power efficiency, etc.

The most interesting aspect of the M4 is simply it's debuting on the iPad lineup, whereas historically it's always been on the iPhone (for A-series) and Macbook (for M-series). Makes sense given low expected yielded for the newest node for one of Apple's lower volume products.

For the curious, the original TSMC N3 node had a lot of issues plus was very costly so makes sense to move away from it: https://www.semianalysis.com/p/tsmcs-3nm-conundrum-does-it-e...


iPads are actually much higher volume than Macs. Apple sells about 2x to 3x as many tablets as laptops.

Of course, phones dwarf both.


The iPad Pros, though?

I'm very curious how much iPad Pros sell. Out of all the products in Apple's lineup, the iPad Pro confuses me the most. You can tell what a PM inside Apple thinks the iPad Pro is for, based on the presentation: super powerful M4 chip! Use Final Cut Pro, or Garageband, or other desktop apps on the go! Etc etc.

But in reality, who actually buys them, instead of an iPad Air? Maybe some people with too much money who want the latest gadgets? Ever since they debuted, the general consensus from tech reviewers on the iPad Pro has been "It's an amazing device, but no reason to buy it if you can buy a MacBook or an iPad Air"

Apple really wants this "Pro" concept to exist for iPad Pro, like someone who uses it as their daily work surface. And maybe some people exist like that (artists? architects?) but most of the time when I see an iPad in a "pro" environment (like a pilot using it for nav, or a nurse using it for notes) they're using an old 2018 "regular" iPad.


>artists? architects?

Ding ding ding ding ding! The iPad Pro is useful primarily for those people. Or at least it was. The original selling point of the Pro was that it had[0] the Apple Pencil and a larger screen to draw on. The 2021 upgrade gave the option to buy a tablet with 16GB of RAM, which you need for Procreate as that has very strict layer limits. If you look at the cost of dedicated drawing tablets with screens in them, dropping a grand on an iPad Pro and Pencil is surprisingly competitive.

As for every other use case... the fact that all these apps have iPad versions now is great, for people with cheaper tablets. The iPad Air comes in 13" now and that'll satisfy all but the most demanding Procreate users anyway, for about the same cost as the Pro had back in 2016 or so. So I dunno. Maybe someone at Apple's iPad division just figured they need a halo product? Or maybe they want to compete with the Microsoft Surface without having to offer the flexibility (and corresponding jank) of a real computer? I dunno.

[0] sold separately, which is one of my biggest pet peeves with tablets


What’s sad about the Air is that it’s only a 60hz screen. I’m spoilt now with 120hz on the first gen iPad Pro, the iPad needs it even more than phones (and they need it). So I’m not a demanding user in all other ways but the Air is not satisfying to me, yet.


I use an iPad Pro as a teacher. I value the pencil and the screen size. Much of what I do ultimately involves A4 paper, so the screen size is a good match.

A lot of teachers now project their iPad screen wirelessly in class, sometimes almost to the exclusion of any other teaching method.

I value the high performance both for everyday ease of use and specifically for screen recordings.

It is not a laptop replacement; it is a wonderful complement.


Totally agree about "Pro". Imagine if they gave it a real OS. Someone yesterday suggested to dual-boot. At first I dismissed that idea. But after thinking about it, I can see the benefits. They could leave ipadOS alone and create a bespoke OS. They certainly have the resources to do so. It would open up so many new sales channels for a true tablet.


> They could leave ipadOS alone and create a bespoke OS.

Asahi Linux already runs on Apple Silicon.

The EU could try to unlock Apple device boot of owner-authorized operating systems.


That's another path to having a real OS. And more likely to be realized.


Which sales channels?


iPadOS 16.3.1 can run virtual machines on M1/M2 silicon, https://old.reddit.com/r/jailbreak/comments/18m0o1h/tutorial...

Hypervisor support was removed from the iOS 16.4 kernel, hopefully it will return in iPadOS 18 for at least some approved devices.

If not, Microsoft/HP/Dell/Lenovo Arm laptops with M3-competitive performance are launching soon, with mainline Linux support.


I presume the sequence of events was: some developer at Apple thought it would be a great idea to port hypervisor support to iPad and their manager approves it. It gets all the way into the OS, then an exec gets wind of it and orders its removal because it allows users to subvert the App Store and Apple Rent. I doubt it’s ever coming back.

This is everything wrong with the iPad Pro in a nutshell. Fantastic hardware ruined by greed.


> subvert the App Store and Apple Rent.

EU and US regulators are slowly eroding that service monopoly.

> Fantastic hardware

Hopefully Apple leadership stops shackling their hardware under the ho-hum service bus.

It's been rumored for years that a touch-optimized version of macOS has been in development for use in iOS VMs. With the launch of M4 1TB 16GB iPad Pros for $2K (the price of two MacBook Airs), Apple can sell developers the freedom to carry one device instead of two, without loss of revenue, https://news.ycombinator.com/item?id=40287922


I bet that touch-optimized macOS will never see the light of day, or if it does it will be insanely crippled. Too much of an existential threat to Apple’s stock price.

Apple is in the midst of a cold war with regulators now. Every new feature will be scrutinized to check that it offers no threat to their golden goose if regulators force them to open it up. Allowing one type of VM means that regulators could force them to allow any type of VM.


It's been rumored for years that a touch-optimized version of macOS has been in development for use within iOS VMs.


Never. Not ever, ever ever.

Apple currently has 5 major build trains: macOS, iOS, watchOS, tvOS (which also runs HomePod), and visionOS. Huge amounts of the code are already the same between them: they literally just build the same stuff with different build settings… except for the UI. The UI has actually unique stuff in each train.

This has become more true over time… teams are likely sick of not having certain dependencies on certain trains, so they’re becoming more identical at the foundation/framework level every release.

Saying they’ll make a macOS with a touch UI is like saying Honda is finally going to make a motorcycle with four wheels and a full car frame. The UI is the differentiating factor in the OS’s. Everything else has already converged or is rapidly doing so.

If the goal is to support macOS apps on iOS then there’s a dilemma: how do you suddenly make apps that are designed from the ground up for a mouse, good for touch? The answer is you don’t: you just make the rest of the system identical (make the same APIs available everywhere) and ask developers to make the UI parts different.

I could almost believe that they’d make a macOS VM available for use with a keyboard and mouse within iOS. But to me it’d make more sense to do a sort of reverse version of how iOS apps are supported on macOS… where macOS apps are run natively on the iPad, but rendered with the iPad’s window management (modulo whatever multitasking features they still need to implement to make this seamless) and strictly require a keyboard and mouse to be in this mode. There’s just no reason to make a VM if you’re doing this: you can just run the binary directly. The kernel is the same, the required frameworks are the same. No VM is needed.


VMs are needed by professional developers who want to run CLI tools and services (e.g. web server, database) without the security restrictions of iOS, while retaining the OS integrity of the iPad Pro device.

Even if a macOS VM had only a CLI terminal and a few core apps made by Apple, using a Swift UI framework that was compatible with a touch interface, it would be a huge step forward for iPad owners who are currently limited to slow and power-expensive emulation (iSH, ashell). Apple could create a new app store or paid upgrade license entitlement for iOS-compatible macOS apps, so that users can pay ISVs for an app version with iOS touch input.


What you’re talking about sounds great but it’s not “a touch optimized version of macOS”. You’re describing a CLI environment in a sandbox.

Apple will never ever take macOS and change its UI to be optimized for touch. Or at least if they do, it’s time to sell the stock. They already have a touch UI, and it’s called iOS. They’re converging the two operating systems by making the underlying frameworks the same… the UI is literally the only thing they shouldn’t converge.


The mythical convertible iPad Pro "docking" to a "MBP Base" to use it as a touchscreen. ;)

I like the fact that a number of iPad and iPhone apps now run on macOS without a simulator or any ceremony. While they are touch-optimized, they're easy enough to use with a pointing device. The gotcha to such mythical OS convergence is the inverse is untrue since a desktop UI is unusable 1:1 on a tablet with the coarser granularity of tapping and less keyboard access.

Perhaps OS-level AI in the future will be able to automatically follow design guidelines and UX rules and generate a usable UI (Storyboards or such View parts) on any platform given a description of data, its importance, and a description of what it should try to look like.


> Microsoft/HP/Dell/Lenovo Arm laptops with M3-competitive performance are launching soon, with mainline Linux support.

I have been seeking someone who’ll be willing to put money on such a claim. I’ll bet the other way. Perchance you’re the person I seek, if you truly believe this?


Which part - launch timing, multicore performance or mainline Linux support?


perf >= M3 while power consumption <= M3, while booted Linux and, say 50%: streaming a video on youtube.com over wifi at min brightness, 50% compiling some C project in a loop, minimum brightness from and to internal SSD.

Compared to macOS on M3 doing the same


its_a_trap.jpg :)

At Qualcomm SoC launch, OSS Linux can't possibly compete with the deep pockets of optimized-shenanigan Windows "drivers" or vertically integrated macOS on Apple Silicon.

But the incumbent landscape of Arm laptops for Linux is so desolate, that it can only be improved by the arrival of multiple Arm devices from Tier 1 PC OEMs based on a single SoC family, with skeletal support in mainline Linux. In time, as with Asahi reverse engineering of Apple firmware interfaces, we can have mainline Linux support and multiple Linux distros on enterprise Arm laptops.

One risk for MS/Asus/HP/Dell/Lenovo devices based on Qualcomm Nuvia/Oryon/EliteX is that Qualcomm + Arm licensing fees could push device pricing into "premium" territory. The affordable Apple Macbook Air, including used M1 devices, will provide price and performance competition. If enterprises buy Nuvia laptops in volume, then Linux will have a used Arm laptop market in 2-3 years.

So.. your test case might be feasible after a year or two of Linux development and optimization. Until then, WSL2 on Windows 11 could be a fallback. For iPad Pro users desperate for portable Linux/BSD VM development with long battery life, Qualcomm-based Arm laptops bring much needed competition to Apple Silicon. If Nuvia devices can run multiple OSS operating systems, it's already a win for users, making possible the Apple-impossible. Ongoing performance improvements will be a bonus.


That’s the point! In two years m5 will exit.

But I’m happy to take that bet with “Linux” replaced with “windows”


Since the hardware already exists and has been benchmarked privately, this is less of a bet and more of an information asymmetry. So let's assume you would win :) Next question is why - is it a limitation of the SoC, power regulators, motherboard design, OS integration, Arm licensing, Apple patents, ..?


iPads as a product line sure, but the M4 is only in the Pros at the moment which are likely lower volume than the MacBook Air.


With Logic Pro for iPad they now have applications for all their traditional Mac use cases on iPad. If anything, it feels like Apple is pushing for a switch from low-tier Macs to iPad Pro.

And they surely can sell more gadgets and accessories for an iPad than for a laptop.


> The Most Powerful Neural Engine Ever

While it is true that the claimed performance for M4 is better than for the current Intel Meteor Lake and AMD Hawk Point, it is also significantly lower (e.g. around half) than the AI performance claimed for the laptop CPU+GPU+NPU models that both Intel and AMD will introduce in the second half of this year (Arrow Lake and Strix Point).


> will introduce

Incredible that in the future there will be better chips than what Apple is releasing now.


The point is that it is a very near future, a few months away.

Apple is also bragging very hyperbolically that the NPU they introduce right now is faster than all the older NPUs.

So, while what Apple says, "The Most Powerful Neural Engine Ever" is true now, it will be true for only a few months. Apple has done a good job, so as it is normal, at launch their NPU is the fastest. However this does not deserve any special praise, it is just normal, as normal as the fact that the next NPU launched by a competitor will be faster.

Only if the new Apple NPU would have been slower than the older models, that would have been a newsworthy failure. A newsworthy success would have been only if the new M4 would have had at least a triple performance than it has, so that the competitors would have needed more than a year to catch up with it.


Is this the first time you're seeing marketing copy? This is an entirely normal thing to do. Apple has an advantage with the SoC they are releasing today, and they are going to talk about it.

I expect we will see the same bragging from Apple's competitors whenever they actually launch the chips you're talking about.

Apple has real silicon shipping right now. What you're talking about doesn't yet exist.

> A newsworthy success would have been only if the new M4 would have had at least a triple performance than it has, so that the competitors would have needed more than a year to catch up with it.

So you decide what's newsworthy now? Triple? That's so arbitrary.

I certainly better not see you bragging about these supposed chips later if they're not three times faster than what Apple just released today.


I said triple, because the competitors are expected to have a double speed in a few months.

If M4 were 3 times faster than it is, it would have remained faster than Strix Point and Arrow Lake, which would have been replaced only next year, giving supremacy to M4 for more than a year.

If M4 were twice faster, it would have continued to share the first position for more than a year. As it is, it will be the fastest for one quarter, after which it will have only half of the top speed.


And then Apple will release M5 next year, presumably with another increase in TOPS that may well top their competitors. This is how product releases work.


strongly doubt we will see M5 so soon


the M3 was released Oct 30, 2023 the M4 was released May 7, 2024

[disco stu] if these trends continue, the M5 will be out on November 14, 2024


I can’t tell what you’re criticizing. Yes, computers get faster over time, and future computers will be faster than the M4. If release cycles are offset by six months then it makes sense that leads only last six months in a neck-and-neck race. I’d assume after Arrow Lake and Strix Point the lead will then go back to M5 in six months, then Intel and AMD’s whatever in another six, etc. I guess that’s disappointing if you expected a multi-year leap ahead like the M1, but that’s just a bad expectation, it never happens and nobody predicted or claimed it.


Apple will also introduce the "Pro" line of their M4 chips later in the year and I expect that they will improve the Neural Engine further.


Don’t worry. It’s Intel we’re talking about. They may say that it’s coming out in 6 months, but that’s never stopped them from releasing it in 3 years instead.


AMD is the one that has given more precise values (77 TOPS) for their launch, their partners are testing the engineering samples and some laptop product listings seem to have been already leaked, so the launch is expected soon (presentation in June, commercial availability no more than a few months later).


I literally don't give a fck about Intel anymore they are irrelevant

The taiwanese silicon industrial complex deserves our dollars. Their workers are insanely hard working and it shows in its product.


There's no Taiwanese silicon industrial complex, there's TSMC. The rest of Taiwanese fabs are irrelevant. Intel is the clear #3 (and looks likely-ish to overtake Samsung? We'll see).


> The Most Powerful Neural Engine Ever

that would be my brain still - at least for now ;)


I suppose all the easier for the M4 Ultra to surpass at WWDC when it double or quadruples the iPad's core count-oriented specs.


damn bro thanks for this

here i am celebrating not pulling the trigger on M2 128gb yesterday

now im realizing M4 ain't shit

will wait a few more months for what you described. will probably wait for AMD

> Given that Microsoft has defined that only processors with an NPU with 45 TOPS of performance or over constitute being considered an 'AI PC',

so already with 77 TOPS it just destroys M4. Rumoured to hit the market in 2 months or less.


My m2 pro is already more powerful than I can use. The screen is too small to do big work like using a daw or doing video editing, the Magic Keyboard is uncomfortable so I stopped writing on it. All that processing power, I don’t know what it will be used for on a tablet without even a good file system. Lousy ergonomics


We will have M4 laptops running 400B parameter models next year. Wild times.


And they will fit in the 8GB RAM with 0.02 bit quant


You can get a macbook pro with 128 GB of memory (for nearly $5000).

Which still implies... a 2 bit quant?


There are some crazy 1/1.5 bit quants now. If you're curious I'll try to dig up the papers I was reading.

1.5bit can be done to existing models. The 1 bit (and less than 1 bit iirc) requires training a model from scratch.

Still, the idea that we can have giant models running in tiny amounts of RAM is not completely far fetched at this point.


Yeah, I'm broadly aware and have seen a few of the papers, though I definitely don't try and track the state of the art here closely.

My impression and experience trying low bit quants (which could easily be outdated by now) is that you are/were better off with a smaller model and a less aggressive quantization (provided you have access to said smaller model with otherwise equally good training). If that's changed I'd be interested to hear about it, but definitely don't want to make work for you digging up papers.


Current 2 bit quant models are useless. Smaller models yield better results.


eli5 quant?


Quant is short for "quantization" here.

LLMs are parameterized by a ton of weights, when we say something like 400B we mean it has 400 billion parameters. In modern LLMs those parameters are basically always 16 bit floating point numbers.

It turns out you can get nearly as good results by reducing the precision of those numbers, for instance by using 4 bits per parameter instead of 16, meaning each parameter can only take on one of 16 possible values instead of one of 65536.


Most claims of "nearly as good results" are massively overblown.

Even the so called "good" quants of huge models are extremely crippled.

Nothing is ever free, and even going from 16 to 8bit will massively reduce the quality of your model, no matter whatever their hacked benchmarks claim.

No, it doesn't help because of "free regularization" either. Dropout and batch norm were also placebo BS that didn't actually help to back in the day when they were still being used.


Interestingly enough, Llama3 suffers more performance loss than Llama2 did at identical quantizations. https://arxiv.org/abs/2404.14047

There's some speculation that a net trained for more epochs on more data learns to pack more information into the weights, and so does worse when weight data is degraded.


Quantization is reducing the number of bits to store a parameter for a machine learning model.

Put simple, a parameter is a number that determines how likely it is that something will occur, ie if the number is < 0.5 say "goodbye" otherwise say "hello".

Now, if the parameter is a 32bit (unsigned) integer it can have a value of 0-4,294,967,296.

If you were using this 32bit value to represent physical objects, then you could represent 4,294,967,296 objects (each object gets given its own number).

However a lot of the time in machine learning, after training you can find that not quite so many different "things" need to be represented by a particular parameter, so if say you were representing types of fruit with this parameter (Google says there are over 2000 types of fruit, but let's just say there are exactly 2000). In that case 4,294,967,296/2000 means there are 2.1 million distinct values we assign each fruit, which is such a waste! Our perfect case would be that we use a number that only represents 0-2000 in the smallest way for this job.

Now is where quantization comes in, where the size of the number we use to represent a parameter is reduced, saving memory size at the expense of a small performance hit of the model accuracy - it's known that many models don't really take a large accuracy hit from this, meaning that the way the parameter is used inside the model doesn't really need/take advantage of being able to represent so many values.

So what we do is say, reduce that 32bit number to 16, or 8, or 4 bits. We go from being able to represent billions or millions of distinct values/states to maybe 16 (with 4bit quantization) and then we benchmark the model performance against the larger version with 32bit parameters - often finding that what training has decided to use that parameter for doesn't really need an incredibly granular value.


An NVIDIA RTX 4090 generates 73 TFLOPS. This iPad gives you nearly half that. The memory bandwidth of 120 GBps is roughly 1/10th of the NVIDIA hardware, but who’s counting!


The 4090 costs ~$1800 and doesn't have dual OLED screens, doesn't have a battery, doesn't weigh less than a pound, and doesn't actually do anything unless it is plugged into a larger motherboard, either.


From Geekbench: https://browser.geekbench.com/opencl-benchmarks

Apple M3: 29685

RTX 4090: 320220

When you line it up like that it's kinda surprising the 4090 is just $1800. They could sell it for $5,000 a pop and it would still be better value than the highest end Apple Silicon.


Comparing these directly like this is problematic.

The 4090 is highly specialized and not usable for general purpose computing.

Whether or not it's a better value than Apple Silicon will highly depend on what you intend to do with it. Especially if your goal is to have a device you can put in your backpack.


I'm not the one making the comparison, I'm just providing the compute numbers to the people who did. Decide for yourself what that means, the only conclusion I made on was compute-per-dollar.


A bit off-topic since not applicable for iPad:

Adding also M3 MAX: 86072

I wonder the results if the test would be done on Asahi Linux some day. Apple implementation is fairly unoptimized AFAK.


That's for OpenCL, Apple gets higher scores through Metal.


And Nvidia annihilates those scores with CUBlas. I'm going to play nice and post the OpenCL scores since both sides get a fair opportunity to optimize for it.


Actually, I'd like to see Nvidia's highest Geekbench scores. Feel free to link them.

It's stupid to look at OpenCL when that's not what's used in real use.


This is true, but... RTX 4090 has only 24GB RAM and M3 can run with 192GB RAM... A game changer for largest/best models...


CUDA features unified memory that is only limited by the bandwidth of your PCIe connector: https://developer.nvidia.com/blog/unified-memory-cuda-beginn...

People have been tiling 24gb+ models on a single (or several) 3090/4090s for a while now.


Shhh, don't correct the believers, they might learn something.


I think it would be simpler to compare cost/transistor.


And yet it’s worth it for deep learning. I’d like to see a benchmark training Resnet on an iPad.


TOPS != TFLOPS

RTX 4090 Tensor 1,321 TOPS according to spec sheet so roughly 35x.

RTX 4090 is 191 Tensor TFLOPS vs M2 5.6 TFLOPS (M3 is tough to find spec).

RTX 4090 is also 1.5 years old.


Yeah where are the bfloat16 numbers for the neural engine? For AMD you can at least divide by four to get the real number. 16 TOPS -> 4 tflops within a mobile power envelope is pretty good for assisting CPU only inference on device. Not so good if you want to run an inference server but that wasn't the goal in the first place.

What irritates me the most though is people comparing a mobile accelerator with an extreme high end desktop GPU. Some models only run on a dual GPU stack of those. Smaller GPUs are not worth the money. NPUs are primarily eating the lunch of low end GPUs.


> The memory bandwidth of 120 GBps is roughly 1/10th of the NVIDIA hardware, but who’s counting

Memory bandwidth is literally the main bottleneck when it comes to the types of applications gpus are used for, so everyone is counting


It would also blow through the iPad’s battery in 4 minutes flat


This comment needs to be downvoted more. TFLOPS is not TOPS, this comparison is meaningless, the 4090 is about 40x TOPS of the M4.


Many thanks for the encouraging comments.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: