As a maintainer of CuPy and also as a user of several GPU-powered Python libraries, I empathize with the frustrations and difficulties here. Indeed, one thing CuPy values is to make the installation step as easy and universal as possible. We strive to keep the binary package footprint small (currently less than 100 MiB), keep dependencies to a minimum, support wide variety of platforms including Windows and aarch64, and do not require a specific CUDA Toolkit version.
If anyone reading this message has encountered a roadblock while installing CuPy, please reach out. I'd be glad to help you.
Just prepare the input on NumPy or CuPy, and then you can just feed it to NumPy APIs. NumPy functions will handle itself if the input is NumPy ndarray, or dispatch the execution to CuPy if the input is CuPy ndarray.
An excellent example of Array API usage can be found in scikit-learn. Estimators written in NumPy are now operable on various backends courtesy of Array API compatible libraries such as CuPy and PyTorch.