Depends on the dataset. You can probably get decent results by restricting the m...

hackater · on Nov 24, 2022

Can you go into a bit more detail? What architecture did you use? Is the month training time really just training with mini batches with a constant learning rate? Or are these many failed attempts until you trained a successful model for a few days in the end?

I particularly interested in the image generation part (the DDPM/SGM)

Jack000 · on Nov 24, 2022

Yeah I did have a few false starts. Total time is more like 3 months vs 1 month for the final model. For small scale training I found it’s necessary to use a long lr warmup period, followed by constant lr.

There’s code on my GitHub (glid3)

edit: The architecture is identical to SD except I trained on 256px images with cosine noise schedule instead of linear. Using the cosine schedule makes the unet converge faster but can overfit if overtrained.

edit 2: Just tried it again and my model is also pretty bad at hands actually. It does get lucky once in a while though.

rrobukef · on Nov 24, 2022

I keep wondering if using not only statistical noise but also deformations would help with the generation of deformable things - say human hands.

dmingod666 · on Nov 24, 2022

F222 does a little more coherent anatomy..not surprising given its background

jbellis · on Nov 24, 2022

What kind of form factor do you use for 4x3090? Don't people usually use the datacenter product line when they're trying to get more than one into a box?

Jack000 · on Nov 24, 2022

The datacenter cards are 3-4x the price for the same speed + double the vram. Gaming cards are a lot more cost effective if your model fits in under 24gb.

I use an open air rig like the ones used for crypto mining. 4x3090 would normally trip the breakers without mods but if you under volt the cards the power draw is just under the limit for a home AC outlet.

svnt · on Nov 24, 2022

How long did the training take on 4 3090s?

Jack000 · on Nov 24, 2022

About 1 month actual training time. It’s a smaller (650m) model and probably still undertrained. Glid3 on GitHub.