The problem solved by SA is equivalently solved by a scalable multiplex IO syscall like kevent and userland scheduling. There's no particular need for the upcall if blocking information is available another way. I think it's dead.
You might want to preempt after scheduling quanta expirations, page faults, etc. And there is still the issue of existing code that use the standard blocking system APIs.
Also a synchronous API can be more efficient if it is not going to block and only the kernel can know.
In practice you are right, so far SA hasn't been worth the complexity.
It kind of petered off as kernel only threads were fast enough with significantly less complexity. In might be worth revisiting again.