Viewing a single comment thread. View all comments

RSchaeffer t1_jb26p98 wrote

Lucas Beyer made a relevant comment: https://twitter.com/giffmana/status/1631601390962262017

"""

​

The main reason highlighted is minibatch gradient variance (see screenshot).

This immediately asks for experiments that can validate or nullify the hypothesis, none of which I found in the paper

​

"""

13