king_of_walrus
king_of_walrus t1_ir3egvh wrote
Reply to [D] How do you go about hyperparameter tuning when network takes a long time to train? by twocupv60
I have a similar problem - some of my models have taken upwards of 10 days to train! So, I have developed a strategy that is working reasonably well.
First, I work with image data and I always start by training and evaluating models at a lower resolution. For example, if I were using the CelebA-HQ dataset I would do all initial development with 128x128 images, then scale up the resolution once my results are good. Often times things translate reasonably well when scaling up and this allows for much more rapid prototyping.
Another strategy that has worked well for me is fine tuning. I train a base model with “best guess” hyperparameters to completion. Then I fine tune for a quarter of the total training time, modifying one hyperparameter of interest while keeping everything else the same. For my work, this amount of time has been enough to see the effects of the changes and to determine clear winners. In a few cases, I have been able to verify my fine tuning results by training the model from scratch under the different configurations - this is what gives me confidence in the approach. I find that this strategy still works when I have hyperparemeters which impact one another; holding one constant and optimizing the other works pretty well to balance them.
I should note that you probably don’t need to tune most hyperparameters, unless it is one you are adding. If it isn’t something novel I feel like there is bound to be a reference in the literature that has what you’re looking for. This is worth keeping in mind, I think.
Overall, it’s not really worth going to great lengths to tune things unless your results are really bad or you’re being edged out by a competitor. However, if your results are really bad that probably speaks to a larger issue.
king_of_walrus t1_ivbdzd4 wrote
Reply to [D] Has anyone tried coding latent diffusion from scratch? or tried other conditioning information aside from image classes and text? by yamakeeen
I don’t think it’s that easy… do you have brain activity/image pairs for training? If so, do you have a lot of them?
To train the conditional models, and you need pairs of targets and conditioning info. Also, the code is a lot to digest. I would suggest looking in ldm/models/diffusion/ddpm.py to see how things work. You can clearly see all diffusion related code and the training logic. It may help your understanding.