You could translate this to a non-CUDA GPU, such as a mobile GPU, but even that would require a bit of effort to be able to condense it such that is wasn't a total lag fest. Executing this on CPU seems damn near impossible from a usability standpoint given the large matrix multiplication involved. You really need the parallel capabilities of a GPU.
It relies on torch and openCV:
- I have never tried running openCV explicitely on CPU but I believe it is doable.
- It is trivial to run torch on CPU instead of GPU (just comment the line that sends the code to the GPU).