Viewing a single comment thread. View all comments

ArnoF7 t1_j5lknua wrote

I have the similar suspicion as well, that the training will be bottlenecked by the slow 1080. But I am wondering if it’s possible to treat 1080 as a pure VRAM extension?

Although it’s possible that the time spent on transferring between different memories makes the gain of having more VRAM pointless

1

FastestLearner t1_j5mwz47 wrote

It is possible, but it would require you to write custom code for every memcopy operation that you want to perform i.e. tensor.to(device), which you can get away with on a smaller project but could become prohibitively cumbersome on a large project. Also you'd still need to do two forward passes (one with the data on the 3080 itself, and then another with the data on the 1080 after having it transferred to the 3080). Whether or not this is beneficial boils down to differences in transfer rates between the RAM-3080 route and the RAM-1080-3080 route. I won't be able to tell which one is faster without benchmarking.

DeepSpeed handles the RAM-3080 to-and-fro transfers for large batch sizes automatically.

1