Submitted by seraphaplaca2 t3_122fj05 in MachineLearning
tdgros t1_jdqjc8q wrote
Reply to comment by Co0k1eGal3xy in Is it possible to merge transformers? [D] by seraphaplaca2
there's also weight averaging in eSRGAN that I knew about, but that always irked me. The permutation argument from your third point is the usual reason I evoke on this subject, and the paper does show why it's not as simple as just blending weights! The same reasoning also shows why blending subsequent checkpoints isn't like blending random networks.
Viewing a single comment thread. View all comments