Viewing a single comment thread. View all comments

ivan_kudryavtsev t1_j3q0c02 wrote

>Oh I thought maybe he is going for distributed learning since he has access to 2 GPUs. In that case MPI has some overhead simply because it has to replicate, scatter and gather all the gradients per batch every epoch.

It looks like no; they speculated about the internal design of A100.

1

Infamous_Age_7731 OP t1_j3qrkhp wrote

Yes indeed, I am not doing anything in parallel. I use them separately and I wanted to compare their internal design as you said.

1