Hacker News new | past | comments | ask | show | jobs | submit login

The idea that this is a drop in replacement for numpy (e.g., `import cupy as np`) is quite nice, though I've gotten similar benefit out of using `pytorch` for this purpose. It's a very popular and well-supported library with a syntax that's similar to numpy.

However, the AMD-GPU compatibility for CuPy is quite an attractive feature.




Note that NumPy, CuPy and PyTorch are all involved in the definition of a shared subset of their API:

https://data-apis.org/array-api/

So it's possible to write array API code that consumes arrays from any of those libraries and delegate computation to them without having to explicitly import any of them in your source code.

The only limitation for now is that PyTorch (and to some lower extent cupy as well) array API compliance is still incomplete and in practice one needs to go through this compatibility layer (hopefully temporarily):

https://data-apis.org/array-api-compat/


It's interesting to see hardware/software/API co-development in practice again.

The last time I think this happen at market-scale was early 3d accelerator APIs? Glide/opengl/directx. Which has been a minute! (To a lesser extent CPU vectorization extensions)

Curious how much of Nvidia's successful strategy was driven by people who were there during that period.

Powerful first mover flywheel: build high performing hardware that allows you to define an API -> people write useful software that targets your API, because you have the highest performance -> GOTO 10 (because now more software is standardized on your API, so you can build even more performant hardware to optimize its operations)


An excellent example of Array API usage can be found in scikit-learn. Estimators written in NumPy are now operable on various backends courtesy of Array API compatible libraries such as CuPy and PyTorch.

https://scikit-learn.org/stable/modules/array_api.html

Disclosure: I'm a CuPy maintainer.


And of course the native Python solution is memoryview. If you need to inter-operate with libraries like numpy but you cannot import numpy, use memoryview. It is specifically for fast low-level access which is why it has more C documentation than Python documentation: https://docs.python.org/3/c-api/memoryview.html


One could also "import jax.numpy as jnp". All those libraries have more or less complete implementations of numpy and scipy (i believe CuPy has the most functions, especially when it comes to scipy) functionality.

Also: You can just mix match all those functions and tensors thanks to the __cuda_array_interface__.


Jax variables are immutable.

Code written for CuPy looks similar to numpy but very different from Jax.


Ah, well, that's interesting! Does anyone know how cupy manages tensor mutability?


CuPy tensors (or `ndarray`) provide the same semantics as NumPy. In-place operations are permitted.


Ah yes, stumbled over that recently, but the error message is very helpful and it's a quick change.


For those interested in the NumPy/SciPy API coverage in CuPy, here is the comparison table:

https://docs.cupy.dev/en/latest/reference/comparison.html


Indeed, has anyone so far successfully drop-in replaced numpy in a project with this cupy and achieved massive improvements? Because, you know, when dealing with GPU it is very important to actually understand how data flows back and forth to it, not only the algorithmic nature of the code written.

As a sidenote, it is funny how this gets released in 2024, and not in say 2014...


Oh yes, I've personally used CuPy for great speed ups compared to Numpy in radar signal processing. Taking a code that took 30 seconds with NumPy down to 1 second with CuPy. The code basically performed a bunch of math on like 100 MB of data, so the PCIe bottleneck was not a big issue.

Also CuPy was first released in 2015, this post is just a reminder for people that such things exist.


Thank you. Your post is informative, and well grounds the very inappropriate hype in mine.


Yeah, the data managed by cupy generally stays on the GPU and you can control when you get it out pretty straightforwardly. It’s great if most of your work happens in a small number of standard operations. Like matrix operations or Fourier transforms, the sort of thing that cupy will provide for you. You can get custom kernels running through cupy but at some point it’s easier to just write c/c++.


It's because it was not possible to write tons of brain-dead, repetitive code in 2014.

In 2024, with AI you can do these kind of projects very fast.


As nice as it is to have a drop in replacement, most of the cost of GPU computing is moving memory around. Wouldn’t be surprised if this catches unsuspecting programmers in a few performance traps.


The moving-data-around cost is conventional wisdom in GP-GPU circles.

Is it changing though? Not only do PCIe interfaces keep doubling in performance, but CPU-GPU memory coherence is a thing.

I guess it depends on your target: 8x H100s across a PCIe bridge is going to have quite different costs vs an APU (which have gotten to be quite powerful, not even mentioning MI300a)


Exactly my experience. You end up having to juggle a whole different set of requirements and design factors in addition to whatever it is that you’re already doing. Usually after a while the results are worth it, but I found the “drop-in” idea to be slightly misleading. Just because the API is the same does not make it a drop-in replacement.


Wondering why AMD isn'y currently heavily investing into creating tons of adapter like this to help yhe transition from cuda.


Hm. Tempted to try pytorch on my Mac for this. I have an AS chip rather than a Nvidia GPU.


> However, the AMD-GPU compatibility for CuPy is quite an attractive feature.

Last I checked (a couple months ago) it wasn't quite there, but I totally agree in principle. I've not gotten it to work on my Radeons yet.


It only supports AMD cards supported by ROCm, which is quite a limited set.

I know you can enable ROCm for other hardware as well, but it's not supported and quite hit or miss. I've had limited success with running stuff against ROCm on unsupported cards, mainly having issues with memory management IIRC.


When I packaged the ROCm libraries that shipped in the Ubuntu 24.04 universe repository, I built and tested them with almost every discrete AMD GPU architecture from Vega to CDNA 2 and RDNA 3 (plus a few APUs). None of that is officially supported by AMD, but it is supported by me on a volunteer basis (for whatever that is worth).

I think that every library required to build cupy is available in the universe repositories, though I've never tried building it myself.


to be clear, you're saying that ROCm works on a much larger range of GPUs than AMD's official support list? that's pretty exciting!


Yes. The primary difference in the support matrix is that all discrete RDNA 1 and RDNA 2 GPUs are enabled on in the Debian packages [1]. There is also Fiji / Polaris support enabled in the Debian packages, although there are a lot of bugs with those.

[1]: https://salsa.debian.org/rocm-team/community/team-project/-/...


Fingers crossed that all future AMD parts ship with full ROCm support.


It's kind of unfortunate that EagerPy didn't get more traction to make that kind of switching even easier.


I'm supposed to end my undergraduate degree with an internship at the italian national research center and i'll have to use pytorch to write ml models from paper to code, i've tried looking at the tutorial but i feel like there's a lot going on to grasp. until now i've only used numpy (and pandas in combo with numpy), i'm quite excited but i'm a bit on the edge because i can't know whether i'll be up to the task or not


Go for it! There's nothing to lose.

You could checkout some of EuroCC's courses. That should get you up to speed. https://www.eurocc-access.eu/services/training/


Thank you, I've found pytorch foundation has examples page where they actually do something practical explaining what they're doing


You'll do fine :) PyTorch has an API that is somewhat similar to numpy, although if you've never programmed a GPU you might want to get up to speed on that first.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: