Viewing a single comment thread. View all comments

alkibijad OP t1_j462x0f wrote

I was hoping to just fine-tune the model, let the training last days at most. Seems like my best chance is to wait for distilled stable diffusion, and use their clip encoder, as u/LetterRip mentions.

2

suflaj t1_j46gu2z wrote

I would proceed with caution because smaller models are generally not that easy to finetune. In fact, the whole point of a larger model is that it not only contains a lot of information, but that it is fairly easy to adapt to new tasks because it has plenty of "space" to restructure itself. A smaller model trying to restructure itself is more likely to diverge or not be able to adapt to the task at all.

It would be more viable in that case to run the larger model layer by layer, finetune it, and then distill onto a smaller one. That way you use the maximum potential of a larger model to adapt to a different task, and you distill it into whatever you need.

3