Same here, and it is excellent.
I am getting a few buffer-drop clicking on an M3 MBP, reducing the polyphony solved it, but just in case, to the author: how much more efficiency you think you can still add to this amazing plugin?
This is a long story, which is still ongoing. The GPU code is very, very heavily optimized (though I do still have some ideas on how to go further). The main problem we're having on Mac hardware is that the OS heuristics for when to turn the clock rate up on the GPU work really poorly for the audio use case. If you want gory details, I've written about it: