Viewing a single comment thread. View all comments

enryu42 t1_iw2m1nt wrote

Even without any optimizations, it is possible to fine-tune StableDiffusion on RTX 3090, even in fp32, with some effort - even with batch size 2 (precomputing latent embeddings, saving some VRAM by not storing the autoencoder params during training).

But this is definitely not a "one-button" solution, and requires more effort than using the existing tools like textual inversion/DreamBooth (which are more appropriate for the "teach the model a new concept" use-case).

3

Flag_Red t1_iw2nxte wrote

If I'm not mistaken, full fine tuning on one 3090 isn't really feasible because of training times. I haven't tried it, but I was under the impression that matching the results of a DreamBooth would take an unreasonably long time.

DreamBooth gets around this by bootstrapping a very small number of training examples to learn a single concept. But if I have a few thousand well labelled images, I should be able to do a fine tune on them (maybe with some regularisation?) and get better results.

2

enryu42 t1_iw2vlwf wrote

Oh, it is totally feasible - I'm getting smth around 2.5 training examples/second with vanilla SD without any optimizations (which translates to more than 200k per day), which is more than enough for fine-tuning.

I'd still not recommend it for teaching the model new concepts though - it is more appropriate for transferring the model to new domains (e.g. here people adapted it to anime images).

1