JanssonsFrestelse
JanssonsFrestelse t1_j1y2npq wrote
Reply to comment by TrueBlueDreamin in [P] I built an API that makes it easy and cheap for developers to build ML-powered apps using Stable Diffusion by TrueBlueDreamin
Also can you supply your own regularization images, or do you have some selection (w. recommendations for e.g. fine-tuning on a person) to choose from? Training the text-encoder as well I assume? What about jointly learning different concepts when fine-tuning on an object/person?
JanssonsFrestelse t1_j1y2df0 wrote
Reply to comment by TrueBlueDreamin in [P] I built an API that makes it easy and cheap for developers to build ML-powered apps using Stable Diffusion by TrueBlueDreamin
Fine-tuning the SD 2.1 768x768 resolution model as well or just the 2.1-base 512x512 model?
JanssonsFrestelse t1_j0l89ve wrote
Reply to comment by LetterRip in [P] Using LoRA to efficiently fine-tune diffusion models. Output model less than 4MB, two times faster to train, with better performance. (Again, with Stable Diffusion) by cloneofsimo
Same here with 8GB VRAM, although looks like I can't use mixed_precision=fp16 with my RTX 2070, so that might be why.
JanssonsFrestelse t1_iy73zgf wrote
Reply to comment by sam__izdat in [P] Stable Diffusion 2.0 and the Importance of Negative Prompts for Good Results (+ Colab Notebooks + Negative Embedding) by minimaxir
Should have used the negative prompt "a bullshit random prompt that performs comparatively"
JanssonsFrestelse t1_iqz2d2m wrote
Reply to comment by New-Post-7586 in [OC] Prices for common food products, August 2010 vs 2022. by robert_ritz
Percentage increase since the starting year instead of absolute dollar amounts would also help.
JanssonsFrestelse t1_j1ydl3c wrote
Reply to comment by TrueBlueDreamin in [P] I built an API that makes it easy and cheap for developers to build ML-powered apps using Stable Diffusion by TrueBlueDreamin
Curated images would be generated by the model being trained using the same prompt for reg images as for the subject training images (found via clip interrogation, swapping out e. g. "a woman" to my subject's token). Not a big deal though, if you can train the 768x768 model I'll try it out. Can't run it locally and colabs for the 768 model have been unreliable. Might write my own later on if the model trained by you shows good quality.
Edit: probably not much use having the exact same prompt, but I'm thinking something similar to the clip classification of the image(s) + the general style/concept you want to learn. Or do you see some issues with the method I've described?