Viewing a single comment thread. View all comments

IdentifiableParam t1_ir31zjs wrote

Weird that this paper didn't seem to cite https://arxiv.org/abs/1409.4842v1 which also used Polyak averaging on models trained on ImageNet.

5

jeankaddour t1_ir4vnnv wrote

Hi, the author here. Thank you for your comment.

While I was aware of GoogLeNet, I didn't read the paper in enough detail to know that they used Polyak averaging too. Thank you for making me aware of it. I'm happy to cite it in the next paper version.

However, the only time they mention averaging is:

Polyak averaging [13] was used to create the final model used at inference time.

My goal with the paper was to study the empirical convergence speed-ups in more detail and be precise about how averaging is used, not claiming to be the first one who applies some sort of averaging to improve the model's final performance (there are plenty of papers that do that already, e.g., the SWA paper mentioned in the related work section).

​

EDIT: Added the citation to the new version!

5

bernhard-lehner t1_ir43bht wrote

Yeah, thats hardly a novel approach...but I have to admit that I also could spend more time looking if anyone else have had the same idea I'm trying at the moment. We really need "Schmidhuber as a Service" :)

4

jeankaddour t1_ir4vzq6 wrote

Hi, the author here. Thank you for your comment.

My goal with the paper was not to present weight averaging as a novel approach; rather, to study the empirical convergence speed-ups in more detail.

Please have a look at the related work section where I discuss previous works using weight averaging, and feel free to let me know if I missed one that focuses on speedups.

5

jeankaddour t1_irdv4a9 wrote

Thanks again for this pointer; this citation has now been added to the today-announced version.

2