Submitted by crappr t3_11qynbp in MachineLearning
Does anyone have experience training a small diffusion model conditioned on text captions from scratch on 64x64 images or possibly even smaller?
I would like to run it only on images of text to see if it is able to render text. How long would this potentially take if I ran it on 1-2 GPUs? Is this something that’s even possible?
rpnewc t1_jc5u6xd wrote
Check out Lucidrains great github repo. Works beautifully.