The idea that this is a drop in replacement for numpy (e.g., `import cupy as np`...

ogrisel · 2024-09-20T15:45:19 1726847119

Note that NumPy, CuPy and PyTorch are all involved in the definition of a shared subset of their API:

So it's possible to write array API code that consumes arrays from any of those libraries and delegate computation to them without having to explicitly import any of them in your source code.

The only limitation for now is that PyTorch (and to some lower extent cupy as well) array API compliance is still incomplete and in practice one needs to go through this compatibility layer (hopefully temporarily):

https://data-apis.org/array-api-compat/

ethbr1 · 2024-09-20T15:56:32 1726847792

It's interesting to see hardware/software/API co-development in practice again.

The last time I think this happen at market-scale was early 3d accelerator APIs? Glide/opengl/directx. Which has been a minute! (To a lesser extent CPU vectorization extensions)

Curious how much of Nvidia's successful strategy was driven by people who were there during that period.

Powerful first mover flywheel: build high performing hardware that allows you to define an API -> people write useful software that targets your API, because you have the highest performance -> GOTO 10 (because now more software is standardized on your API, so you can build even more performant hardware to optimize its operations)

kmaehashi · 2024-09-20T16:14:19 1726848859

An excellent example of Array API usage can be found in scikit-learn. Estimators written in NumPy are now operable on various backends courtesy of Array API compatible libraries such as CuPy and PyTorch.

https://scikit-learn.org/stable/modules/array_api.html

Disclosure: I'm a CuPy maintainer.

kccqzy · 2024-09-20T16:29:57 1726849797

And of course the native Python solution is memoryview. If you need to inter-operate with libraries like numpy but you cannot import numpy, use memoryview. It is specifically for fast low-level access which is why it has more C documentation than Python documentation: https://docs.python.org/3/c-api/memoryview.html

KeplerBoy · 2024-09-20T14:19:55 1726841995

One could also "import jax.numpy as jnp". All those libraries have more or less complete implementations of numpy and scipy (i believe CuPy has the most functions, especially when it comes to scipy) functionality.

Also: You can just mix match all those functions and tensors thanks to the __cuda_array_interface__.

yobbo · 2024-09-20T15:02:58 1726844578

Jax variables are immutable.

Code written for CuPy looks similar to numpy but very different from Jax.

bbminner · 2024-09-20T15:25:33 1726845933

Ah, well, that's interesting! Does anyone know how cupy manages tensor mutability?

kmaehashi · 2024-09-20T16:24:01 1726849441

CuPy tensors (or `ndarray`) provide the same semantics as NumPy. In-place operations are permitted.

KeplerBoy · 2024-09-20T17:30:16 1726853416

Ah yes, stumbled over that recently, but the error message is very helpful and it's a quick change.

kmaehashi · 2024-09-20T16:18:37 1726849117

For those interested in the NumPy/SciPy API coverage in CuPy, here is the comparison table:

https://docs.cupy.dev/en/latest/reference/comparison.html

larodi · 2024-09-21T07:03:19 1726902199

Indeed, has anyone so far successfully drop-in replaced numpy in a project with this cupy and achieved massive improvements? Because, you know, when dealing with GPU it is very important to actually understand how data flows back and forth to it, not only the algorithmic nature of the code written.

As a sidenote, it is funny how this gets released in 2024, and not in say 2014...

KeplerBoy · 2024-09-21T07:13:37 1726902817

Oh yes, I've personally used CuPy for great speed ups compared to Numpy in radar signal processing. Taking a code that took 30 seconds with NumPy down to 1 second with CuPy. The code basically performed a bunch of math on like 100 MB of data, so the PCIe bottleneck was not a big issue.

Also CuPy was first released in 2015, this post is just a reminder for people that such things exist.

larodi · 2024-09-21T11:23:59 1726917839

Thank you. Your post is informative, and well grounds the very inappropriate hype in mine.

lacker · 2024-09-21T18:50:24 1726944624

Yeah, the data managed by cupy generally stays on the GPU and you can control when you get it out pretty straightforwardly. It’s great if most of your work happens in a small number of standard operations. Like matrix operations or Fourier transforms, the sort of thing that cupy will provide for you. You can get custom kernels running through cupy but at some point it’s easier to just write c/c++.

faangguyindia · 2024-09-22T07:35:40 1726990540

It's because it was not possible to write tons of brain-dead, repetitive code in 2014.

In 2024, with AI you can do these kind of projects very fast.

Narhem · 2024-09-20T18:16:14 1726856174

As nice as it is to have a drop in replacement, most of the cost of GPU computing is moving memory around. Wouldn’t be surprised if this catches unsuspecting programmers in a few performance traps.

markhahn · 2024-09-21T18:50:18 1726944618

The moving-data-around cost is conventional wisdom in GP-GPU circles.

Is it changing though? Not only do PCIe interfaces keep doubling in performance, but CPU-GPU memory coherence is a thing.

I guess it depends on your target: 8x H100s across a PCIe bridge is going to have quite different costs vs an APU (which have gotten to be quite powerful, not even mentioning MI300a)

low_tech_love · 2024-09-21T05:59:22 1726898362

Exactly my experience. You end up having to juggle a whole different set of requirements and design factors in addition to whatever it is that you’re already doing. Usually after a while the results are worth it, but I found the “drop-in” idea to be slightly misleading. Just because the API is the same does not make it a drop-in replacement.

BiteCode_dev · 2024-09-21T10:37:03 1726915023

Wondering why AMD isn'y currently heavily investing into creating tons of adapter like this to help yhe transition from cuda.

paperplatter · 2024-09-20T19:04:27 1726859067

Hm. Tempted to try pytorch on my Mac for this. I have an AS chip rather than a Nvidia GPU.

WCSTombs · 2024-09-20T18:20:53 1726856453

> However, the AMD-GPU compatibility for CuPy is quite an attractive feature.

Last I checked (a couple months ago) it wasn't quite there, but I totally agree in principle. I've not gotten it to work on my Radeons yet.

sspiff · 2024-09-20T20:25:14 1726863914

It only supports AMD cards supported by ROCm, which is quite a limited set.

I know you can enable ROCm for other hardware as well, but it's not supported and quite hit or miss. I've had limited success with running stuff against ROCm on unsupported cards, mainly having issues with memory management IIRC.

slavik81 · 2024-09-20T23:10:13 1726873813

When I packaged the ROCm libraries that shipped in the Ubuntu 24.04 universe repository, I built and tested them with almost every discrete AMD GPU architecture from Vega to CDNA 2 and RDNA 3 (plus a few APUs). None of that is officially supported by AMD, but it is supported by me on a volunteer basis (for whatever that is worth).

I think that every library required to build cupy is available in the universe repositories, though I've never tried building it myself.

markhahn · 2024-09-21T16:30:48 1726936248

to be clear, you're saying that ROCm works on a much larger range of GPUs than AMD's official support list? that's pretty exciting!

slavik81 · 2024-09-24T04:30:52 1727152252

Yes. The primary difference in the support matrix is that all discrete RDNA 1 and RDNA 2 GPUs are enabled on in the Debian packages [1]. There is also Fiji / Polaris support enabled in the Debian packages, although there are a lot of bugs with those.

[1]: https://salsa.debian.org/rocm-team/community/team-project/-/...

sitkack · 2024-09-20T21:00:29 1726866029

Fingers crossed that all future AMD parts ship with full ROCm support.

hedgehog · 2024-09-20T17:56:37 1726854997

It's kind of unfortunate that EagerPy didn't get more traction to make that kind of switching even easier.

amarcheschi · 2024-09-20T16:40:45 1726850445

I'm supposed to end my undergraduate degree with an internship at the italian national research center and i'll have to use pytorch to write ml models from paper to code, i've tried looking at the tutorial but i feel like there's a lot going on to grasp. until now i've only used numpy (and pandas in combo with numpy), i'm quite excited but i'm a bit on the edge because i can't know whether i'll be up to the task or not

KeplerBoy · 2024-09-20T17:44:26 1726854266

Go for it! There's nothing to lose.

You could checkout some of EuroCC's courses. That should get you up to speed. https://www.eurocc-access.eu/services/training/

amarcheschi · 2024-09-21T15:41:13 1726933273

Thank you, I've found pytorch foundation has examples page where they actually do something practical explaining what they're doing

saagarjha · 2024-09-21T08:36:01 1726907761

You'll do fine :) PyTorch has an API that is somewhat similar to numpy, although if you've never programmed a GPU you might want to get up to speed on that first.