No it in terms of flexibility of programming the SIMD is less flexible if you need any decision making. In SIMD you are also typically programming in Intrinsics not at a higher level like in CUDA.
For example I can do
for (int tid = 0 ; tid<n; tid+= num_threads) {
C[tid] = A[tid] * B[tid] + D[tid];
}
In SIMD yes I can stride the array 32 / (or in rvv at vl) at a time but generally speaking as new Archs come along I need to rewrite that loop for the wider Add and mpy instructions and increase width of lanes etc. But in CUDA or other GPU SIMT strategies I just need to bump the compiler and maybe change 1 num_threads variable and it will be vectorizing correctly.
Even things like RVV which I am actually pushing for my SIMD machine to move toward these problems exists because its really hard to write length agnostic code in SIMD intrinsics. That said there is a major benefit in terms of performance per watt. All that SIMT flexibility costs power that's why Nvidia GPUs can burn a hole through the floor while the majority of phones have a series of vector SIMD machines that are constantly computing Matrix and FFT operations without your pocket becoming only slightly warmer
For example I can do for (int tid = 0 ; tid<n; tid+= num_threads) { C[tid] = A[tid] * B[tid] + D[tid]; }
In SIMD yes I can stride the array 32 / (or in rvv at vl) at a time but generally speaking as new Archs come along I need to rewrite that loop for the wider Add and mpy instructions and increase width of lanes etc. But in CUDA or other GPU SIMT strategies I just need to bump the compiler and maybe change 1 num_threads variable and it will be vectorizing correctly.
Even things like RVV which I am actually pushing for my SIMD machine to move toward these problems exists because its really hard to write length agnostic code in SIMD intrinsics. That said there is a major benefit in terms of performance per watt. All that SIMT flexibility costs power that's why Nvidia GPUs can burn a hole through the floor while the majority of phones have a series of vector SIMD machines that are constantly computing Matrix and FFT operations without your pocket becoming only slightly warmer