Viewing a single comment thread. View all comments

TimDarcet t1_j1w6ifs wrote

I think the supervised training they report in MAE is 300 epochs, they used a different recipe compared to finetuning (appendix, page 12, table 11)

2

netw0rkf10w OP t1_j2939o2 wrote

You are right, indeed. Not sure why I missed that. I guess one can conclude that DeiT 3 is currently SoTA for training from scratch.

1