Viewing a single comment thread. View all comments

MadScientist-1214 t1_j8ox26g wrote

Better than AdamW if (a) the model is a transformer, (b) not a lot of augmentations are used. Otherwise, the improvements are not that large. I doubt this optimizer works well with regular CNNs like efficientnet or convnext.

21

CoderHD t1_j989j2g wrote

In my limited testing on a UNet like CNN, it doesnt even come close to the performance of adam sadly. With that said, i might be doing something wrong.

3