The hunt for the M1’s neural engine

marcan_42 · on March 30, 2022

The Load Balancer is there because the M1 Max has two independent Neural Engines (unlike the GPU cores, which are load balanced in hardware and the OS sees as a single one even on the Ultra)... but one ANE is inexplicably disabled on all production systems. The M1 Ultra, logically, has four... and only the first in each die is enabled.

I was waiting for the Mac Studio to drop to come to a conclusion, since it's plausible one ANE could've been disabled for power reasons on the laptops... but with both secondary ANEs in each die in the M1 Ultra off, and with no reports of anyone seeing the first ANE being disabled instead (which could mean it's a yield thing), I'm going to go ahead and say there was a silicon bug or other issue that made the second ANE inoperable or problematic, and they just decided to fuse it off and leave it as dead silicon on all M1 Max/Ultra systems.

bayindirh · on March 30, 2022

Maybe they want to push one as further as possible before enabling the other, hence seeing the limits and increasing the life of the systems by forcing developers to optimize their code?

Sony pulled a similar trick on their A7 series cameras, and enabled more advanced AF features just with a firmware upgrade. It made the bodies "new" and pushed them at least half a generation forward. It's not the same thing, I know, but it feels similar enough for me.

AgentOrange1234 · on March 30, 2022

Another possible reason would be that there’s a hardware bug which only occurs when both are enabled. (Not saying this is what’s happening here, but it’s very common to ship partly disabled chips to work around bugs.)

vatys · on March 30, 2022

For chips, that's commonly referred to as "binning" as in the sorting machine drops chips into different bins based on a test result.

A big design with many cores, such as CPU or GPU cores, may have manufacturing defects that makes one or more cores bad. Or it may be on one side or another of a tolerance range and not be able to work at higher power or higher frequency. These parts may get "binned" into a lower performance category, with some cores disabled (because a flaw prevents the core from working) or with reduced maximum performance states.

These are still "good" parts, and can be sold at a lower cost with lower performance, while the "better" and "best" parts will pass more tests and be able to have more or all portions of the chip enabled.

So it's not so much to work around a "bug" which might be a common flaw to all part designs, rather to work around manufacturing tolerance and allow more built parts to be useful rather than garbage.

ghettoimp · on March 30, 2022

Binning is definitely a possibility. Separately from binning, there are often just features that don't work right and get disabled with "chicken switches" or "feature enable bits."

Any two-ANE design would have a lot of control logic that has to be right, e.g., to manage which work gets sent to which ANE, which cache lines get loaded, etc. It's easy to imagine bugs in this logic which would only show up when both ANEs are enabled. So it's likely that there is a chicken bit that you could use to disable one of the ANEs and run in single-ANE mode.

brigade · on March 30, 2022

Manufacturing defects are not hardware bugs.

Binning is irrelevant to hardware bugs.

vatys · on March 30, 2022

That's what I'm saying, in response to the previous comment saying this is to work around a bug. Bugs are common to all parts, where defects are unique per part. Binning works around manufacturing defects and turns a yield problem into different grades or SKUs of parts.

klysm · on March 30, 2022

That’s a very interesting tactic that I haven’t heard of before - almost the opposite of built in obsolescence?

bayindirh · on March 30, 2022

Yes. Actually professional photographic equipment doesn't get obsolete. Lenses are expensive and making them forward and backward compatible makes sure the user stays inside the ecosystem. Also, you want higher end bodies to be dependable, so you don't obsolete them, but supersede them with better capabilities.

I can take my old D70s today and take beautiful photos, even with it's 6MP sensor, however a newer body would be much more flexible.

giobox · on March 30, 2022

> Actually professional photographic equipment doesn't get obsolete

> I can take my old D70s today and take beautiful photos, even with it's 6MP sensor

I suspect if you do a wedding shoot with a 6mp interchangeable lens camera, some customers are rightly going to ask questions when you hand over the work... Of course professional photographic equipment gets obsolete - even lens systems get deprecated every 20-30 years too. Newer sensors have vastly more dynamic range than the d70s among other image quality benefits.

I think you argument holds water much more strongly in the context of amateur users, where for sure you can keep getting nice images from old gear for a long time.

bayindirh · on March 30, 2022

> I suspect if you do a wedding shoot with a 6mp interchangeable lens camera, some customers are rightly going to ask questions when you hand over the work...

Unless you're printing A3 pages, getting gigantic pictures, or cropping aggressively, D70s can still hold up pretty well [0].

> even lens systems get deprecated every 20-30 years too.

Nikon F mount is being deprecated in favor of Z because of mirrorless geometries, not because the lenses or the designs are inferior (given the geometry constraints). Many people still use their old lenses, or nifty fifties are still produced with stellar sharpness levels. I'm not entering into "N" or "L" category of lenses of their respective mounts. Not all of them are post 2000 designs, or redesigns, and they produce extremely good images.

> Newer sensors have vastly more dynamic range than the d70s among other image quality benefits.

As a user of both D70s and A7III I can say that, if there's good enough light (e.g day), one can take pretty nice pictures with a D70s, even today. Yes, it dies pretty fast when light goes low, or it can't focus as fast, or can't take single shot (almost) HDR images (A7III can do that honestly, and that's insane [4]), but unless you're chasing something moving, older cameras are not that bad. [1][2][3]

> I think you argument holds water much more strongly in the context of amateur users, where for sure you can keep getting nice images from old gear for a long time.

Higher end, action oriented professional cameras are not actually built with resolution in mind, especially at the top end. All of the action DSLRs and mirrorless cameras up to a certain point are designed with speed and focus in mind. You can't see A7R or Fuji GFX series in weddings or in stadiums. You'll see A9s, Canon 1D or Nikon D1 series cameras. They're built to be fast. Not high res.

A wedding is more forgiving, but again a high MP camera is not preferred since it's more prone to vibration blurring.

[0]: https://www.youtube.com/watch?v=ku3lT8MjyFM

[1]: https://www.flickr.com/photos/zerocoder/41901384135/in/album...

[2]: https://www.flickr.com/photos/zerocoder/28459579257/in/album...

[3]: https://www.flickr.com/photos/zerocoder/39910477633/in/album...

[4]: https://www.flickr.com/photos/zerocoder/33984196648/in/album...

inferiorhuman · on March 30, 2022

No.

> Unless you're printing A3 pages, getting gigantic pictures, or cropping aggressively, D70s can still hold up pretty well

You even qualified that later with "if there's good enough light" and "unless you're chasing something moving". No, a D70 won't work well for wedding photography. Yes, people shot weddings with much slower film. They don't anymore because, like the D70, slow film is obsolete. People shot weddings with manual focus lenses too and the D70 is awful for MF lenses from the tiny viewfinder to the lack of support for non-CPU lenses. When the D70 was a current product some people did (make no mistake the D70 was never marketed as a pro body) simply because the D70 was on par with its contemporaries.

> Nikon F mount is being deprecated in favor of Z because of mirrorless geometries

Even within the scope of the F mount the D70 is obsolete — it's incompatible with new E and AF-P lenses.

> A wedding is more forgiving

Wedding photography is about the most technically challenging, least forgiving (low light, constant motion, spontaneous behavior) type of photography out there. The point you were responding to still stands – older digital photographic equipment is obsolete in a professional context while having some utility for hobbyists. Nobody's taking a D1 out to shoot sports these days. In fact most people didn't when it was new because Nikon's autofocus was so far behind Canon's.

bayindirh · on March 31, 2022

> No.

No.

> You even qualified that later with "if there's good enough light" and "unless you're chasing something moving". No, a D70 won't work well for wedding photography.

I didn't intend to say "You can shoot weddings with a D70s". I just wanted to say, a D70s can take beautiful photos, even today, with today's standards, that's all. Even Nikon didn't position D70s for that kind of action when it was brand new.

> People shot weddings with manual focus lenses too and the D70 is awful for MF lenses from the tiny viewfinder to the lack of support for non-CPU lenses.

As a person who shot MF on both film and D70s, I tend to disagree, but that's not a hill I'd prefer to die on, at least within the borders of this comment box.

> Even within the scope of the F mount the D70 is obsolete — it's incompatible with new E and AF-P lenses.

I think being able to select between this much [0] of lenses is enough for most people.

> Wedding photography is about the most technically challenging, least forgiving (low light, constant motion, spontaneous behavior) type of photography out there.

Sorry, no. I shoot at tango nights. You have much more freedom in weddings. You can use flash, come close, people expect you, etc. A "low light" wedding situation is "what you can expect in a good tango night". You can't use flash, use lenses slower than f/2.2 (or more specifically ~t/2.5), because even a latest generation sensor will just choke, you can't use big lenses and be a distracting element, or come close for any reason. So, no. A wedding is not a piece of cake, but much easier from a technical point of view. Weddings have their problems like the distance/area you have the cover, the equipment you have to carry on you, duration, and storage and energy logistics, I agree, but it's not as challenging in terms of light or camera capabilities.

> ...older digital photographic equipment is obsolete in a professional context while having some utility for hobbyists.

I'd rather rephrase this. Newer photographic equipment is much more capable and makes professionals' life much, much easier. Obsolescence is something different in my eyes, and it's not the same as "not as useful today as of yesterday". Even an analog Pentax MF body is not obsolete in today's photographic world, even for professionals. It might not be their everyday body, but's it's neither useless, nor obsolete.

[0]: https://www.lensora.com/lensesfor.asp?camera=nikon-d70s

inferiorhuman · on April 6, 2022

> I just wanted to say, a D70s can take beautiful photos, even today, with today's standards, that's all. Even Nikon didn't position D70s for that kind of action when it was brand new.

Take a look at what equipment qualifies you for professional support (NPS). Even the D3xxx series qualify. The D70 does not. The D70 does not because it is considered obsolete by Nikon.

> As a person who shot MF on both film and D70s, I tend to disagree, but that's not a hill I'd prefer to die on, at least within the borders of this comment box.

Sure, I shot manual lenses for quite a while with a D200 (with and without a split prism screen). DSLRs (autofocus bodies from any manufacturer really), especially lower end ones, are not well suited to manual focus lenses. Fast lenses like you would need for a wedding exacerbate this as you simply cannot resolve enough contrast to nail the focus with a big aperture.

There's no way around the fact that the D70 has a small viewfinder (95% coverage, sure, but only 0.75x magnification). The D200 had a 0.94x viewfinder and even that was well more challenging than the old ME Super I cut my teeth on.

> I think being able to select between this much [0] of lenses is enough for most people.

The D70 is still obsolete.

> Sorry, no. I shoot at tango nights. You have much more freedom in weddings.

Sorry, no. You can re-do a tango night. Try to redo a wedding shot and you'll be dealing with bridezilla at best. And, sure, you can use a flash (assuming the venue is okay with it) just like you can create unhappy customers. Even the brighter, outdoor weddings I've been to (not as a photographer thank god) have way more uneven lighting than any sort of indoor dance venue.

> I'd rather rephrase this. Newer photographic equipment is much more capable and makes professionals' life much, much easier. Obsolescence is something different in my eyes, and it's not the same as "not as useful today as of yesterday". Even an analog Pentax MF body is not obsolete in today's photographic world, even for professionals. It might not be their everyday body, but's it's neither useless, nor obsolete.

A D70 is still obsolete. Your old Pentax will shoot just fine with whatever K (or whatever depending on the age) lenses. Your D70 will function with a subset of F mount lenses and both older and newer lenses won't work. There are probably exceptions for those who are wedded to the novelty sensors Nikon used in some of their older cameras but no pro is going to be shooting with a D70.

I mean, look, you can use old manual focus (but not pre-AI) lenses with the D70. You'll have to bring a separate light meter though (or just chimp it) because the meter does not work at all with non-CPU lenses. Stop down metering? Nope. Nada. Much, much easier is a major understatement. What pro work are you going to do without an in-body light meter? Studio stuff? The D70 is obsolete.

Here's the list of DX bodies that Nikon considers pro gear: D500, D300S, D7500, D7200, D7100, D5600, D5500, D5300, D3500.

https://www.nikonpro.com/ProductList.aspx

therein · on March 30, 2022

A7 SII has continuous autofocus now via a firmware update? That would be very exciting to me.

bayindirh · on March 30, 2022

I was talking about A7III. Since I don't have the A7SII, I don't follow its firmware updates. A7III got Animal Eye-AF and much better tracking via a firmware upgrade.

IshKebab · on March 30, 2022

My guess would be either a power consumption issue - with both ANEs enabled you could get voltage droops below the acceptable limit. Or it requires software support that they haven't implemented yet. Software always takes waaaay longer than hardware people expect.

marcan_42 · on March 30, 2022

They sold the chips as having only on me ANE, and software support is there since it's used in the M1 Ultra...

erwincoumans · on March 30, 2022

That is interesting, do you have any references / articles that describe that some ANE's are disabled?

Could it be overheating if the ANE's are not used the right way?

Jcowell · on March 30, 2022

Probably not but he’s the one the highest divers into the M1 Chip due to his Linux Project so he’s probably the reputable source. Any article written would probably be referencing him.

erwincoumans · on March 30, 2022

Ah, delightful to have a/the master behind Asahi development/reverse engineering efforts here!

bschwindHN · on March 30, 2022

Delightful to see the creator of Bullet Physics here in the comments too!

dannyw · on March 30, 2022

It's just an yields thing I bet; particularly given the relative lack of usage of Neural Engine. If Apple engineers realize that, disabling half the Neural Engines makes no user-visible difference and improves yields by ~1% (hence decreasing silicon costs by ~1%), that's an easy hundreds of millions in saved wafers.

ricardobeat · on March 30, 2022

Doesn’t the fact that it’s always the first chip that gets disabled disprove that theory? If it was to improve yield you’d see the other one being disabled at least some of the time.

fredoralive · on March 30, 2022

It probably depends on the chip yield, and the sample size of whatever survey was used for the assertion it was always the second engine disabled.

It would be reasonable to assume that if both engines work, then the second is always the one to be disabled. Therefore to have the second enabled you'd need to find one where the first engine has failed, and has no other chip killing faults. Depending on the yield TSMC gets, these could be quite rare, so you'd have to have quite a large survey to find them.

Or as other people have noted, it could be an errata meaning the second core is broken, as this isn't the only possible reason.

WithinReason · on March 30, 2022

This would be a reasonable theory, except the neural engine is a small part of the total chip area and thus unlikely to contribute significantly to total chip yield

marcan_42 · on March 30, 2022

I know it's not the largest sample size, but I did ask twitter and nobody found one with ane1 enabled, it's always ane0.

runnerup · on March 30, 2022

> it's plausible one ANE could've been disabled for power reasons on the laptops

The linked posting notes that they were able to get the ANE to draw 49 mW. Is this such a significant amount of power that its worth permanently disabling for laptop power draw? Or is there likely much more power being used elsewhere to support ANE in addition to the 49 mW that can be measured directly?

grishka · on March 30, 2022

Is the second ANE disabled in hardware or is it possible to reenable it through software somehow?

marcan_42 · on March 30, 2022

It's disabled and locked in boot firmware, and the firmware is signed.

unicornfinder · on March 30, 2022

Which implies that, at least theoretically, that Apple could enable the other neural engine at a later date (not that they would).

exikyut · on March 30, 2022

Just make sure there are no SkyNet singularities hiding inside first.

...maybe Apple disabled it for a reason, y'know?

thisNeeds2BeSad · on March 30, 2022

Maybe protect against lawsuits from a chip patch slowdown after architectural migations are necessary, because of inherent design speed optimization safety flaws?

greggsy · on March 30, 2022

Perhaps they’ll allow you to use it through subscription, like z/OS mainframes locking down cores?

servytor · on March 30, 2022

If you are interested in the M1 neural engine, I highly recommend you check out this[0].

[0]: https://github.com/geohot/tinygrad/tree/master/accel/ane

rickdeveloper · on March 30, 2022

He live streamed himself writing a lot of that:

https://www.youtube.com/watch?v=mwmke957ki4

https://www.youtube.com/watch?v=H6ZpMMDvB1M

https://www.youtube.com/watch?v=JAyw7OAcXDE

https://www.youtube.com/watch?v=Cb2KwcnDKrk

erwincoumans · on March 30, 2022

Yes, George Hotz (geohot) reverse engineered the neural engine and could make it work for tinygrad, the videos posted in the other reply describe the reverse engineering process.

I wonder why Apple didn't provide low-level API's to access the hardware? It may have various restrictions. I recall Apple also didn't provide proper API's to access OpenCL frameworks on iOS, but some people found workarounds to access that as well. Maybe they only integrate with a few limited but important use cases, TensorFlow, Adobe that they can control.

Could it be that using the ANE in the wrong way overheats the M1?

aseipp · on March 30, 2022

Because machine learning accelerators are, in the broadest sense, not "done" and rapidly evolving every year. Exposing too many details of the underlying architecture is a prime way to ossify your design, making it impossible to change, and as a result you will fall behind. It is possible the Neural Engine of 2022 will look very different to the one of 2025, as far as the specifics of the design, opcode set, etc all go.

One of the earliest lessons along this line was Itanium. Itanium exposing so much of the underlying architecture as a binary format and binary ABI made evolution of the design extremely difficult later on, even if you could have magically solved all the compiler problems back in 2000. Most machine learning accelerators are some combination of a VLIW and/or systolic array design. Most VLIW designers have learned that exposing the raw instruction pipeline to your users is a bad idea not because it's impossibly difficult to use (compilers do in fact keep getting better), but because it makes change impossible later on. This is also why we got rid of delay slots in scalar ISAs, by the way; yes they are annoying but they also expose too much of the implementation pipeline, which is the much bigger issue.

Many machine learning companies take similar approaches where you can only use high-level frameworks like Tensorflow to interact with the accelerator. This isn't something from Apple's playbook, it's common sense once you begin to design these things. In the case of Other Corporations, there's also the benefit that it helps keep competitors away from their design secrets, but mostly it's for the same reason: exposing too much of the implementation details makes evolution and support extremely difficult.

It sounds crass but my bet is that if Apple exposed the internal details of the ANE and later changed it (which they will, 100% it is not "done") the only "outcome" would be a bunch of rageposting on internet forums like this one. Something like: "DAE Apple mothershitting STUPID for breaking backwards compatibility? This choice has caused US TO SUFFER, all because of their BAD ENGINEERING! If I was responsible I would have already open sourced macOS and designed 10 completely open source ML accelerators and named them all 'Linus "Freakin Epic" Torvalds #1-10' where you could program them directly with 1s and 0s and have backwards compatibility for 500 years, but people are SHEEP and so apple doesn't LET US!" This will be posted by a bunch of people who compiled "Hello world" for it one time six months ago and then are mad it doesn't "work" anymore on a computer they do not yet own.

> Could it be that using the ANE in the wrong way overheats the M1?

No.

smoldesu · on March 30, 2022

Was it really necessary to expand the fourth paragraph post-script to get your point across? Before it was a fairly holistic look at the difference between people who want flexibility and people who want stability, where neither party was necessarily right. Now it just reads like you're mocking people for desiring transparency in their hardware, which... seems hard to demonize?

aseipp · on March 30, 2022

There are other replies talking about Apple or whatever but I'll be honest: because 2 decades of online forum experience and FOSS development tells me that the final paragraph is exactly what happens anytime you change things like this and they are exposed to turbo-nerds, despite the fact they are often poorly educated and incredibly ill-informed about the topics at hand. You see it here in spades on HN. It doesn't have anything to do with Apple, either; plenty of FOSS maintainers could tell you similar horror stories. I mean it's literally just a paraphrase of an old XKCD.

To be fair though, I mean. I'm mostly a bitchy nerd, too. And broadly speaking, taking the piss is just good fun sometimes. That's the truth, at least for me.

If it helps, simply close your eyes and imagine a very amped up YouTuber saying what I wrote above. But they're doing it while doing weird camera transitions, slow-mo shots of panning up the side of some Mac Mini or whatever. They are standing at a desk with 4 computers that are open-mobo with no case, and 14 GPUs on a shelf behind them. Also the video is like 18 minutes long for some reason. It's pretty funny then, if you ask me.

smoldesu · on March 30, 2022

For sure, I don't think I disagree with anything you've written here. Where I take umbrage is when there is no choice involved though. Apple could very well provide both a high-level, stable library while also exposing lower-level bindings that are expected to break constantly. If the low-level library is as bad and broken as people say it is, then they should have no problem marketing their high-level bindings as a solution. This is a mentality that frustrates me on many levels of their stack; their choice of graphics API and build systems being just a few other examples.

Maybe this works for some people. I can't knock someone for an opinionated implementation of a complicated system. At the same time though, we can't be surprised when other people have differing opinions, and in a perfect society we wouldn't try to crucify people for making those opinions clear. Apple notoriously lacks a dialogue with their community about this stuff, which is what starts all of this pointless infighting in the first place. Apple does what Apple does, and nerds will fight over it until the heat death of the universe. There really is nothing new under the sun. Mocking the ongoing discussion is almost as phyrric as claiming victory for either side.

nebula8804 · on March 30, 2022

Absolutely. It provided a visualization reminder of so many people that come out of their holes to argue whenever there is some criticism of open source. Its one thing to desire freedom but the reality of the situation is that community is toxic for some reason and just not fun to even converse with.

gjsman-1000 · on March 30, 2022

He's not wrong - that's absolutely what YouTube and online Linux commentators would do. They have their own echo chamber, just as much as any tech community. Heck, considering your past posts, it's probably something you would do.

As for transparency in hardware, it probably will become more transparent once Apple feels that it is done and a finished science. They don't want to repeat Itanium.

EricE · on March 30, 2022

I think it was absolutely appropriate because I have seen that cycle happen many, many times over the years.

Especially when Apple is involved. Hell there are still people who see them as beleaguered and about to go out of business at any moment :p

smoldesu · on March 30, 2022

I get where you're coming from. It's par for the course on Apple's behalf to push this stuff away in lieu of their own, high-level implementation, but I also think that behavior puts them at an impasse. People who want to use this hardware for arbitrary purposes are unable to do so. Apple is unwilling to do it because they want their hand on the "API valve" so to speak. In a case where absolutist rhetoric is being used on either side, I think this is pretty expected. If we're ultimately boiling this down to "having choices" vs "not having choices" though, I think it's perfectly reasonable to expect the most valuable company in the world to go the extra mile and offer both choices to their users and developers.

Or not. It's their hardware, they just won't be selling any Macs to me with that mindset. The only thing that irks me is when people take the bullet for Apple like a multi-trillion dollar corporation needs more people justifying their lack of interoperability.

exikyut · on March 31, 2022

Perhaps the "high-level access only" ideology extends to policy considerations as well. End-users appear to have no shortage of time or ideas to make AI trip over its shadow in ways that may have unfortunate policy implications for corporations with uncomfortably-large social and political footprints (where "footprint" represents "potential impact" and does not indicate extant specifics).

In much the same way the App Store is an infuriating shh-don't-call-it-censorship bottleneck that gives Apple total and final control over what your (sorry, Apple's) devices can do, I wonder if political considerations represents a portion of Apple's motivation to keep things reasonably locked down. Obviously Apple can just kick apps it doesn't like out of the App Store, and binaries that would need to be downloaded and run directly on Macs is exceedingly unlikely to go viral to the same extent, so perhaps I'm overthinking things to the point of paranoia.

ben174 · on March 30, 2022

Meh, it's okay to be grumpy sometimes. He got his point across and clearly knows what he's talking about. Let him be passionate :)

fredoralive · on March 30, 2022

Possibly just to avoid having programs that rely too much on specific implementation details of the current engine causing issues in the future if they decide to change the hardware design? An obvious comparison is graphics cards where you don't get low level access to the GPU[1], so they can change architecture details across generations.

Using a high level API probably makes it easier to implement a software version for hardware that doesn't have the neural engine, like Intel Macs or older A-cores.

[1] Although this probably starts a long conversation about various GPU and ML core APIs and quite how low level they get.

mhh__ · on March 30, 2022

Apple don't want to let people get used to the internals and spiritually like to enforce a very clear us versus them philosophy when it comes to their new toys. They open source things they want other people to standardize around but if it's their new toy then its usually closed.

aseipp · on March 30, 2022

In general I kind of agree with this, but this move isn't anything specific to Apple. Every company designing ML accelerators is doing it. None of them expose anything but the most high level framework they can get away with to users.

I honestly don't know of a single company offering custom machine learning accelerators that let you do anything except use Tensorflow/PyTorch to interface with them, not a chance in hell any they actually will give you the underlying ISA specifics. Maybe the closest is, like, the Xilinx Versal devices or GPUs, but I don't quite put them in the same category as something like Habana, Groq, GraphCore, where the architecture is bespoke for exactly this use case, and the high level tools are there to insulate you from architectural changes.

If there are any actual productionized, in-use accelerators with low level details available that weren't RE'd from the source components, I'd be very interested in seeing it. But the trend here is very clear unless I'm missing something.

my123 · on March 30, 2022

Habana has their own SynapseAI layer that their TF/PyTorch port runs on. Custom ops are supported too, via a compiler targeting the TPCs, using a C language variant.

Oh, and they have an open-source UM software stack for those but it's really not usable. Doesn't allow access to the systolic arrays (MME), only using the TPCs is just _starting_ to enumerate what it doesn't have. (but, it made the Linux kernel maintainers happy so...):

https://github.com/HabanaAI/SynapseAI_Core#limitations (not to be confused with the closed-source SynapseAI)

aseipp · on March 30, 2022

Well, that's good to hear at least! I knew there was some back and forth between the kernel maintainers recently due to all these accelerator drivers going in without any usermode support; Habana's case was kind of interesting because they got accepted into accel/ early by Greg, but they wouldn't have passed the merge criteria used later on for most others like Qualcomm.

Frankly I kind of expected the whole result of that kerfuffle to just be that Habana would let the driver get deleted from upstream and go on their merry way shipping drivers to customers, but I'm happy to be proven wrong!

xenadu02 · on March 30, 2022

CoreML is the API to use the ANE.

erwincoumans · on March 30, 2022

Thanks, that's right there is a high level API. I meant low-level API's, and to clarify changed my post.

exikyut · on March 30, 2022

The likeliest reason is long-term ABI ossification.

irae · on March 30, 2022

All the sibling comments are better guesses, but I would also guess there could be security implications on exposing lower level access. Having it all proprietary and undocumented is itself a way of making it harder to exploit. Albeit, as mentioned, not having to settle ABI is way more likely the primary reason.

kmeisthax · on March 30, 2022

Apple Silicon has IOMMUs on everything - you generally can't exploit a bug in a coprocessor to gain more access on the main application processor (or another coprocessor). The only hardware bugs with security implications we've found was stuff like M1RACLES, which is merely a covert channel (and it's discoverer doesn't even think it's a problem). Apple does a pretty good job of making sure even their private/internal stuff is secure.

WithinReason · on March 30, 2022

A high level API needs much less support effort.

pedro_hab · on March 30, 2022

As a developer I am bit ashamed of this question, but I gotta ask:

What consumer apps actually use Neural Engines?

I think something like Photoshop, maybe. But wouldn't it just train a model and use it as regular code?

I'm interested in AI, but it's a joke to me about startups and jargon more often than not.

I feels weird to add this to all chips when I can't see that much usage.

OberstKrueger · on March 30, 2022

Pixelmator Pro uses it for some of its ML functionality. Image scaling can use it, and it provides a cleaner image when upscaling, removing some compression artifacts and just smoothing it out more naturally. I’ve found it can work well downsizing too, although less of an effect. They also have an ML auto-crop tool and ML denoiser. All of these will hit the Neural Engine pretty good.

sharkjacobs · on March 30, 2022

Anything "predictive" probably uses Neural Engine. These are iOS features, but a lot of them apply to MacOS too

- Visual Lookup - Animoji - Face ID - recognizing "accidental" palm input while using Apple Pencil - monitoring users' usage habits to optimize device battery life and charging - app recommendations - Siri - curating photos into galleries, selecting "good" photos to show in the photos widget - identifying people's faces in photos - creating "good" photos with input from tiny camera lenses and sensors - portrait mode - language translation - on-device dictation - AR plane detection

Core ML API allows third party developers to use Neural Engine to run models

FinalBriefing · on March 30, 2022

But which consumer apps use it? I know of a handful of photo apps that use it to enhance photos, but I'm not aware of any other types of apps.

kitsunesoba · on March 30, 2022

I think there may be more indirect usage than direct usage. Little bits of neural engine usage are peppered through the native APIs.

Of course if you’re using third party libraries that don’t rely on macOS APIs this won’t be happening in your app.

gfody · on March 30, 2022

could the M1 itself be using it for branch prediction?

happycube · on April 1, 2022

Unlikely, the latency would be much too high. If it is doing neural branch prediction it's inside the CPU core itself.

miohtama · on March 30, 2022

Maybe the neural engine is ideal to scan your local image library to find kiddy porn:

https://www.theverge.com/2021/12/15/22837631/apple-csam-dete...

OskarS · on March 30, 2022

This is slightly beside the point, but those stack traces are C++ functions. I was pretty surprised by that (though, granted, I don't know anything about macOS internals). I would have expected either pure C for the kernel, or maybe Objective-C if they wanted a higher level language. They don't really have any C++ in their operating system APIs, right? Like, if you interface with Core <Whatever>, isn't that all C, Objective-C and Swift? Is there a lot of C++ in the macOS kernel/userspace?

jjoonathan · on March 30, 2022

Yeah, the "driver" parts of Mac OS X -- the bits that aren't Mach or BSD -- use a restricted subset of C++. It doesn't carry over into userland. Not much, anyway.

I know that C++ is out of vogue, but last time I wrote drivers in linux and OSX (10 years ago) I left with the distinct impression that it was oversold, at least compared to C. C clunks hard and C++ addresses the worst of it. I've never had to corral a bunch of overzealous junior C++ programmers, which I suspect is where the C++ reputation comes from, but Apple went down that path and wound up with something pretty decent.

IMO today it's a shrug and 25 years ago it was forward looking.

zozbot234 · on March 30, 2022

Linux devs are working to support Rust, which is a lot cleaner than C++ (essentially, no implementation inheritance) and has more straightforward interfacing with C components.

jjoonathan · on March 30, 2022

I'm not convinced that "no inheritance" is better -- drivers seem to be one place where it's actually put to good use -- but I also don't think it really matters that much. In contrast, Rust definitely has fewer foot-guns, and that matters a lot.

Of course, the OSX driver code is going on 25 years old, so it's not evidence of anyone's opinion that C++ beats Rust for OS dev.

StillBored · on March 30, 2022

Selective implementation inheritance is actually quite useful for kernel/driver development if its used where function pointers in C tend to be used. Pluggable interfaces. It standardizes the syntax, provides default implementations, and forces implementation of mandatory methods.

C++ when used as a better C, really is better.

zozbot234 · on March 30, 2022

Pluggable interfaces are ok, and Rust supports them just fine via traits. Actual implementation inheritance is inherently anti-modular.

galad87 · on March 30, 2022

I don't know how many, but a good number of frameworks are C++ internally, for example WebKit, the Objective-C runtime itself, IOKit uses a subset of C++, Metal Shading Language is a subset of C++.

gurkendoktor · on March 30, 2022

There is (was?) always a lot of C++ in Core Animation stacktraces, and Core Animation was really foundational to the iPhone UI.

saagarjha · on March 30, 2022

IOKit in the kernel is a C++ API; the userspace version is C.

irae · on March 30, 2022

They are still a fork of BSD, so it stands to reason that the team working for years on kernel and lower level parts of the OS still uses C++.

TickleSteve · on March 30, 2022

Mach is not a fork of BSD. The kernel is Mach (Hybrid microkernel) supporting a BSD API.

astrange · on March 30, 2022

There is a lot of FreeBSD in the kernel/libc/userland. BSD doesn't use C++ though, they use C.

endorphine · on March 30, 2022

For the totally uninitiated, what's a neural engine in general? What are they used for, and why Apple added this to their products?

jlouis · on March 30, 2022

It's an area of the chip suited for operations on low precision/range floating point numbers. Neural networks don't generally require high precision in floating point computations, but require a lot of them. This means your data paths can be smaller (16 bit wide rather than 32 bit wide); the consequence of which is you can do far more computation per mm2 die space.

The second part of the chain is that you also tailor the operations the die area support to those operations necessary in a typical neural network, further optimizing the chip.

The end result is very power-efficient execution of neural networks, which allows you to beat a GPU or CPUs power curve, improving thermals of the core, and in the case of mobile devices, optimizes battery usage.

acdha · on March 30, 2022

It's probably also worth noting that the last part is fairly important to Apple since they have based a lot of their privacy stance around on-device processing of things like Siri commands, photo/video analysis, image recognition (VoiceOver can attempt to automatically describe images for blind people), speech to text dictation, the various audio/video enhancements like Center Stage or the voice emphasis features, etc. and all of that means they're running a lot of networks on battery.

Those efficiency wins are almost certainly worth it even if third-party developers don't use it much.

justusthane · on March 30, 2022

His previous blog post has more general information about the neural engine: https://eclecticlight.co/2022/03/29/live-text-visual-look-up...

Basically it's used by anything involving machine learning:

- Speech recognition

- Face recognition

- Visual lookup (image recognition)

- Live Text (OCR)

It allows all these functions to be performed efficiently on-device rather than shipping data off to the cloud.

modeless · on March 30, 2022

Isn't BNNS documented to run on the CPU? Why would you expect it to use the neural engine? Apple also has Metal Performance Shaders which of course run on the GPU only. The user accessible API for the neural engine is Core ML. Very high level unfortunately.

Hmm, it seems like there's also a new API that can use the neural engine sometimes, "ML Compute". But only for inference? https://developer.apple.com/documentation/mlcompute/mlcdevic...

noduerme · on March 30, 2022

This is a bizarre result but so.. what's the conclusion? That only a few things like Apple's proprietary image lookup are able to tap into the ANE so far? Or that it's actually just a marketing gimmick?

Reading this makes me wonder if it's not just a placeholder for some kind of intrusive system that will neural-hash everything you own, but I'm sure I'm just being paranoid.

masklinn · on March 30, 2022

> Reading this makes me wonder if it's not just a placeholder for some kind of intrusive system that will neural-hash everything you own, but I'm sure I'm just being paranoid.

It’s also actively counter-productive: if they wanted to do this sort of tracking, they could just have done what all of their competitors do and send the data straight to their servers. This is hardware (and thus expenses) which is only necessary because of their stance on privacy, and avoiding off-device work.

mikotodomo · on March 30, 2022

Apple has made huge compromises to try and give some privacy back to the consumer, who lost it all from paying for cheap products where the product is you. And people just ignore this progress because they want to be anti-Apple. It's sad.

my123 · on March 30, 2022

The ANE is accessible via the CoreML framework, it's a high level interface for ML inference.

It however turns out that a _lot_ of customer apps today don't use those accelerators at all.

(and about the attempt at using BNNS functions, that's not offloaded, it runs on the host CPU cores w/ the AMX tightly bound accelerator)

galangalalgol · on March 30, 2022

Would it be possible for someone ro write a onnx runtime utilizing coreml? That would open it up to a lot more applications instantly.

ianai · on March 30, 2022

ONNX lists CoreML support.

avianlyric · on March 30, 2022

I would guess that the ANE has some very specialised hardware (E.g. INT8 or FP16 only), and there’s isn’t a huge amount of it. So many nets either don’t fit completely, or aren’t using the right types of operation to match the ANE. Either can’t run on the ANE, or only have a subset of layers that can run on the ANE.

So when running a neural net iOS / macOS needs to make a decision about where to run each net. Even if a net has layers in it that are a perfect match for the ANE, there’s still a trade off from having move the workload back and forth between a CPU core and the ANE (although the unified memory should eliminate a big chunk of this cost).

It might be that in the general case it’s not worth the latency hit from using mixed processors when running net that isn’t 100% ANE compatible, or could just be that Apple haven’t got round to implementing the logic needed to gracefully spilt workloads across the ANE and a CPU core. Which would make sense, because they’ve got the time and expertise to ensure all their nets fit within the ANE. Something that’s difficult for 3rd party devs to do, because they don’t have access to detailed ANE docs.

sharikous · on March 30, 2022

I have the same anxiety but I don't think it is still there.

The Asahi project is putting a deliberate low priority on the ANE but I have seen some other small reverse engineering attempts.

I think some use of the ANE outside of Apple APIs will be possible soon.

bayindirh · on March 30, 2022

Tensorflow has a CoreML enabled version which run on ANE.

https://github.com/apple/tensorflow_macos

my123 · on March 30, 2022

AFAIK that doesn't run on ANE but on the GPU. ANE is used for inference only w/ CoreML.

londons_explore · on March 30, 2022

> That only a few things like Apple's proprietary image lookup are able to tap into the ANE so far?

That would seem like the logical conclusion. Perhaps there are hardware bugs/shortcomings that makes it very hard to use for the neural network API. Perhaps the software team is just behind and still building that.

noduerme · on March 30, 2022

My initial understanding was that Adobe was leveraging it in their experimental "AI" Photoshop plugins. Although a majority of those seem to require a live internet connection to work. Some of the newer core Photoshop functionality for intelligent selection is ridiculously fast on an M1 Max, though, which makes me think it's probably using the neural chips.

sharikous · on March 30, 2022

It might be, but there is also another matrix multiplication accelerator that could be responsible

my123 · on March 30, 2022

Using CoreML has some catches, like having to use FP16 instead of integer formats notably.