M1, M2, M3 still have very low number of GPU cores. Apple should release some better hardware to take advantage of their recently released MLX library.
At this moment it looks clear to me that Apple won’t go that way. It’s enough for them to focus on inference and actual application not the heavy training part. They have been probably training models on a cluster with non Apple silicon and make them available for their chips only for inference.
Not to mention entirely outsourcing training workloads to specialist firms. Apple does a lot of secretive outsourcing of things you might think they would or should do in-house. This contrasts with Google and Meta who seem to like keeping everything in-house.
It’s true that their GPUs are slower than Nvidia’s. But keep in mind that cores are really different and cannot be compared across architectures. You want more Gflops, not necessarily more cores.