Unfortunately there's not really any competition in the GPU space for AI use. AMD hasn't stepped up, and Intel is barely getting its footing for basic gaming functionality. Nvidia has an established monopoly on the GPGPU space, and forcing their hand, and them withdrawing from Linux compatibility, would result in everyone that needs Nvidia to ditch Linux. At least until an alternative becomes available, and then those customers would ditch Nvidia for screwing them over once already.
This is where WSL pulls its mask off. Windows will become (even more so) the platform of choice by being a "more compatible" version of Linux than Linux. And AI developers don't seem to have much affinity for open-source. Why would they care about being stuck on Microsoft and Nvidia's proprietary platforms when they're building their own proprietary models?
Great point. WSL has always been a wolf in sheeps clothing, in my eyes.
I'm not saying it's not useful. It certainly fills a role people clearly wanted filled. But you'd be naive to think Microsoft isn't taking on an "embrace, extend, extinguish" mentality. Again.
Also, I'm sure AI devs would be all for making it more difficult to run models on the desktop so you'd be forced to use a cloud service. They could even keep using a non-GPL patched Linux on their servers because it's not like they're distributing anything.
With Apple's significant work from hardware on up with AI there's a very near future where many AI applications will train models on a Mac[0] directly in XCode[1], bundle it with the iOS/Mac OS application (and browsers), and run inference directly on the device. Apple will also be deploying more and more of their own foundational models shipped in the hardware/OS. Most user AI applications will be as simple as "call this coreml API". With the links provided it could be said we're already there and they've been laying the groundwork for years.
This hits Nvidia and big GPU cloud providers two fold:
1) Training. We already see with the unified memory architecture of Apple Silicon that Apple can provide accelerated training with RAM capable of training/finetuning very large models[2]. 800GB/s bandwidth memory (192 GB!!!) is very close to the RTX 4090 (1TB/s). Also note how many PCIe X16 slots the Mac Pro has... You can be certain that follow up Apple Silicon will eat closer and closer to Nvidia datacenter GPU performance/capability and with Apple's resources potentially even surpass it for all but things like training massive LLM models from scratch a la Google, Meta, OpenAI, etc.
2) Inference. Why pay AWS or whoever to run your inference when users have already spent their money on hardware that does it faster (because network latency) and more privately on the devices they already have? iOS 17 already does some AI tuning (FINALLY) on device with auto-correct - expect to see this show up everywhere in iOS and iOS applications.
I don't make many predictions but this one is obvious. I'm convinced the next generations of Apple Silicon and the Apple software ecosystem, developer experience, etc will be the first very real threat to Nvidia. When this really gets going Intel and AMD will be permanently left at the starting line (where they are now) for all but very bespoke use cases and deployments like Frontier[3].
For people who take issue with MacOS look at the progress of Asahi...
Then, once Google/Android catches up things will get REALLY interesting. This is the future and the current grab for Nvidia GPUs and clouds will eventually look like crypto mining - a few great years to be capitalized on when the getting was good.
> 800GB/s bandwidth memory (192 GB!!!) is very close to the RTX 4090
40xx is not the competition. DGX H100 has 2TB of memory, H100 NVL has 2x4TB bandwidth and are selling as fast as they can be made at any price they ask.
I quoted the 4090 because this is the Mac Pro on the second gen of Apple Silicon in a desktop/workstation. In a few years Apple has nearly caught up to the flagship desktop GPU from Nvidia - who has been at this former niche for 15 years. 192GB is already enough RAM to finetune very large LLMs and we will be seeing more memory from Apple down the road.
192GB is 4 RTX A6000s or 2 A100/H110s ($20k just in GPUs)... A completely maxed out Mac Pro (with wheels and fancy mouse) is under $13k. Oh yeah and you can actually buy them.
The M3, M4, and beyond will have TFLOPS and RAM jumping by leaps and bounds. CoreML will start chipping at CUDA developer mindshare from academia on up - you can already run LLMs at more than reasonable speed on a $2000 Mac Book you already have laying around. Expect to see Mac Pros showing up in AI labs at universities and course work where every student just buys a Mac. Of course at this level it's mostly higher level frameworks anyway so transferrable knowledge to CUDA powered whatever.
As I noted, the big stuff from Nvidia will own the very high end (especially for training) for the foreseeable future and there's plenty of money to be made there as you noted.
But Apple is going to take big chunks out of Nvidia on the hardware side and a host of startups from the software/SaaS side. For hardware less demand on the low-mid side and less demand on the inference hosting side even for the big players.
For software leveraging foundational Apple models, dev tools, etc will make it so that any app developer can drop data in Xcode to tune/build a model for their app. Apps will be able to further finetune with user data on device.
See the links I provided - this is here (in beta) today.