Thakshu t1_iysr30r wrote on December 3, 2022 at 9:41 PM

Reply to comment by twocupv60 in [D] Ensemble Training Logistics and Mathematical Equivalences by twocupv60

I think you are right here. But mathematical equivalence bothers me. Since they end up with dissimilar parameters , are they equivalent?.

Thakshu t1_iysghrm wrote on December 3, 2022 at 8:28 PM

Reply to [D] Ensemble Training Logistics and Mathematical Equivalences by twocupv60

If i understand correctly , the question is whether training N clasifiers independently and obtaining their mean result is mathematically equivalent to training N classiefiers together with mean output .

For me it appears as not mathematically equivalent .(Edited a wrong statement here)

The gradient for back prop per step is calculated based on mean output of all classifiers . So the loss values will be smoother than the first case , if the starting point is independently initialized.

Do I have a thinking mistake ?. I can't identify it yet.