Newbie question, why can’t someone just take a pre-trained model/network with all the settings/weights/whatever and run it on a different configuration (at a heavily reduced speed)?
Isn’t it like a Blender/3D studio/Autocad file, where you can take the original 3D model and then render it using your own hardware? With my single GOU it will take days to raytrace a big scene, whereas someone with multiple higher speced GPUs will need a few minutes.
It’s not totally clear what you are asking. The models are trained on something like an NVIDIA A100 which is a super high end machine learning processor, but inference can be run on a home GPU. So this is a “different configuration”.
But I think maybe you mean, can they make a model which normally needs a lot of RAM run more slowly on a machine that only has a little RAM?
It sounds like there are some tricks to allow the use of smaller amounts of ram by making specific algorithmic tweaks, so if a model normally needs 12GB of VRAM then, depending on the model, it may be possible to modify the algorithm to use 1/2 the RAM for example. But I don’t think it’s the same as other rendering tasks where you can use arbitrarily less compute and just run it longer.
The main limitation for running these AIs is that you need tons of VRAM available for your GPU to get any good performance out of them. I don't have a video card with 12GiB of VRAM and I don't know anyone who does.
If you're willing to wait more (30 seconds per image, assuming limited image sizes) there are repositories that will run the model on the CPU instead, leveraging your much cheaper RAM.
In theory you could swap VRAM in and out in the middle of the rendering process, but this would make the entire process incredibly slow. I think you'll have more success just running the CPU version if you're willing to accept slowdowns.
Eyeing the price graph of that 3060, it might be "commonplace" among the population that built a gaming PC in the last couple months, or went all-out in the past ~1.5 years (availability not taken into account).
Most people I know don't have a desktop in the first place, and on average I wouldn't guess that desktop users build a new one more often than once every ~4 years. And that's among people who build their own; if you buy pre-built, you have to spend a lot extra to get those top of the line specs.
It's possible to now go out and buy this on a whim if you have a tech job or equivalent salary, though.
Unfortunately the 3060ti, 3070 and 3070ti are limited to 8GiB, so it is certainly not common.
In the price range it is the only Nvidia card with 12GiB and the 3080 starts at 10GiB.
So you can certainly get a 12GiB card without spending 3080+ money, but if you want any more power than a 3060 and keep the 12GiB then you would need to spring for a 3080 12GiB which is a big jump in price.
If you use the provided pytorch code, have a modern CPU and enough physical RAM, you can do this currently. As you suggest, inference/generation will take anywhere from hours to days using a CPU instead of a GPU or other ML-accelerator-chip.
Newbie question, why can’t someone just take a pre-trained model/network with all the settings/weights/whatever and run it on a different configuration (at a heavily reduced speed)?
Isn’t it like a Blender/3D studio/Autocad file, where you can take the original 3D model and then render it using your own hardware? With my single GOU it will take days to raytrace a big scene, whereas someone with multiple higher speced GPUs will need a few minutes.