That's also what AVX is but with a conservative number of threads.. If you really understand your problem I don't see why you would need 32 threads of much smaller data size or why you would want that far away from your CPU.
Whether your new coprocessor or instructions look more like a GPU or something else doesn't really matter if we are done squinting and calling it graphics like problems and/or claiming it needs a lot more than a middle class PC.