Does anyone have experience training a small diffusion model conditioned on text captions from scratch on 64x64 images or possibly even smaller?

I would like to run it only on images of text to see if it is able to render text. How long would this potentially take if I ran it on 1-2 GPUs? Is this something that’s even possible?

Comments

You must log in or register to comment.

rpnewc t1_jc5u6xd wrote on March 14, 2023 at 7:01 AM

Check out Lucidrains great github repo. Works beautifully.

PM_ME_JOB_OFFER t1_jc5z6f0 wrote on March 14, 2023 at 8:13 AM

Yo who IS this guy? He's got implementations for everything! How is anyone that productive?

femboyxx98 t1_jc601pw wrote on March 14, 2023 at 8:25 AM

The actual implementation of most models is quite simple and he often reuses the same building blocks. The challenge is obtaining the dataset and actually training the models (and hyper parameter search) and he doesn’t provide any trained weights himself - it’s hard to know if his implementations even work out of the box.

therentedmule t1_jc86u50 wrote on March 15, 2023 at 12:09 AM

Many repos are not usable and have click-bait names (e.g., palm-Rlhf)

[deleted] t1_jc6jj8h wrote on March 14, 2023 at 12:30 PM

[removed]

Hameliton t1_jc6pz1h wrote on March 14, 2023 at 1:25 PM

Phil Wang is the goat as always

madebyollin t1_jc7306u wrote on March 14, 2023 at 3:00 PM

some "full-power" repos from well-known developers:

I also posted my own tiny standalone code for 64x64 diffusion model training here: https://github.com/madebyollin/dino-diffusion - though it doesn't have text conditioning.

mikonvergence t1_jc6yxlo wrote on March 14, 2023 at 2:32 PM

This has example of both low and high resolution data, all from scratch and also accompanying videos! No text to image case though as it only focuses on image modalities.

https://github.com/mikonvergence/DiffusionFastForward

bhagy7 t1_jc8rdbj wrote on March 15, 2023 at 2:32 AM

Yes, it is possible to train a small diffusion model conditioned on text captions from scratch on 64x64 images or even smaller. Depending on the complexity of the model and the number of GPUs you are using, it could take anywhere from a few hours to several days. If you are