CurrentlyJoblessFML

CurrentlyJoblessFML t1_j99mphb wrote

I definitely think diffusion based generative ai models are a great idea. And whole heartedly agree that training GANs can be very painful. Head over to the hugging face diffusers library and you should be able to find a few models that are able to do unconditional image generation. They also have cookie cutter scripts that you can just execute to start training your model from the get go. They also have detailed instructions for how you can set up your own training data.

Although I have been working with these models for a while and I think training diffusion models can be very computationally intensive. Do you have access to a GPU cluster? If not, I’d recommend a U-Net based approach which you could train on GPU/TPUs on Google colab.

I have been using these class of models for my masters thesis and I would be happy to help in case you have any questions. Good luck! :)

2

CurrentlyJoblessFML OP t1_j508inw wrote

Hi! Thanks for the response. I’ll try my luck by just concatenating my noisy input with yt along the channel dimension and see if that works. In the SR3 paper, the authors also mention that they tried using a different way to condition the model but they found that simply concatenating it gave them the same generation quality so they just stuck with that.

Good luck with your project and HMU if you ever want to discuss this. I’ve been breaking my head on these diffusion models for the past couple of days so I feel your struggle.

3