Viewing a single comment thread. View all comments

DingWrong t1_iyq0nr0 wrote

Big models get sharded and chunks get loaded on each gpu. There are a lot of frameworks ready for this as the big NLP models can't fit on a single gpu. Alpa even shards the model on different machines.

3

computing_professor t1_iyqaku8 wrote

Thanks. So it really isn't the same as how the Quadro cards share vram. That's really confusing.

1