Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

No it in terms of flexibility of programming the SIMD is less flexible if you need any decision making. In SIMD you are also typically programming in Intrinsics not at a higher level like in CUDA.

For example I can do for (int tid = 0 ; tid<n; tid+= num_threads) { C[tid] = A[tid] * B[tid] + D[tid]; }

In SIMD yes I can stride the array 32 / (or in rvv at vl) at a time but generally speaking as new Archs come along I need to rewrite that loop for the wider Add and mpy instructions and increase width of lanes etc. But in CUDA or other GPU SIMT strategies I just need to bump the compiler and maybe change 1 num_threads variable and it will be vectorizing correctly.

Even things like RVV which I am actually pushing for my SIMD machine to move toward these problems exists because its really hard to write length agnostic code in SIMD intrinsics. That said there is a major benefit in terms of performance per watt. All that SIMT flexibility costs power that's why Nvidia GPUs can burn a hole through the floor while the majority of phones have a series of vector SIMD machines that are constantly computing Matrix and FFT operations without your pocket becoming only slightly warmer



Consider applying for YC's Winter 2026 batch! Applications are open till Nov 10

Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: