Viewing a single comment thread. View all comments

samb-t t1_iwguscp wrote

Have you got the 1.3M number from the config file (config.training.n_iters = 1300001), if so that's the number of training steps not epochs! So hopefully more like around 7 hours to train on an A100, thank god!

3

ButterscotchLost421 OP t1_iwgwqq0 wrote

Ah yes, you're right! Thank you so much!

Does 7 secs per epoch sound approximately right to you?

2

samb-t t1_iwgz27t wrote

7 secs sounds very fast but if you're not using a massive model, it's on cifar, and on an A100 it's not implausible, but you might want to double check so you're sure

3