enryu42 t1_iw2m1nt wrote on November 12, 2022 at 2:01 PM

Reply to comment by Flag_Red in [D]We just release a complete open-source solution for accelerating Stable Diffusion pretraining and fine-tuning! by HPCAI-Tech

Even without any optimizations, it is possible to fine-tune StableDiffusion on RTX 3090, even in fp32, with some effort - even with batch size 2 (precomputing latent embeddings, saving some VRAM by not storing the autoencoder params during training).

But this is definitely not a "one-button" solution, and requires more effort than using the existing tools like textual inversion/DreamBooth (which are more appropriate for the "teach the model a new concept" use-case).

Flag_Red t1_iw2nxte wrote on November 12, 2022 at 2:17 PM

If I'm not mistaken, full fine tuning on one 3090 isn't really feasible because of training times. I haven't tried it, but I was under the impression that matching the results of a DreamBooth would take an unreasonably long time.

DreamBooth gets around this by bootstrapping a very small number of training examples to learn a single concept. But if I have a few thousand well labelled images, I should be able to do a fine tune on them (maybe with some regularisation?) and get better results.

enryu42 t1_iw2vlwf wrote on November 12, 2022 at 3:19 PM

Oh, it is totally feasible - I'm getting smth around 2.5 training examples/second with vanilla SD without any optimizations (which translates to more than 200k per day), which is more than enough for fine-tuning.

I'd still not recommend it for teaching the model new concepts though - it is more appropriate for transferring the model to new domains (e.g. here people adapted it to anime images).