Can you go into a bit more detail?
What architecture did you use? Is the month training time really just training with mini batches with a constant learning rate? Or are these many failed attempts until you trained a successful model for a few days in the end?
I particularly interested in the image generation part (the DDPM/SGM)
Yeah I did have a few false starts. Total time is more like 3 months vs 1 month for the final model. For small scale training I found it’s necessary to use a long lr warmup period, followed by constant lr.
There’s code on my GitHub (glid3)
edit: The architecture is identical to SD except I trained on 256px images with cosine noise schedule instead of linear. Using the cosine schedule makes the unet converge faster but can overfit if overtrained.
edit 2: Just tried it again and my model is also pretty bad at hands actually. It does get lucky once in a while though.
What kind of form factor do you use for 4x3090? Don't people usually use the datacenter product line when they're trying to get more than one into a box?
The datacenter cards are 3-4x the price for the same speed + double the vram. Gaming cards are a lot more cost effective if your model fits in under 24gb.
I use an open air rig like the ones used for crypto mining. 4x3090 would normally trip the breakers without mods but if you under volt the cards the power draw is just under the limit for a home AC outlet.
I trained from scratch with 4x3090 and while it’s not as good as SD it’s surprisingly better with hands.