
Long_Two_6176 t1_j8h28vw wrote

So today I found out that mps on PyTorch 13.1 (stable) has bugs causing a lack of learning. FashionMNIST accuracies bounced around in <10%. Switched to cpu and worked fine (>80%)


Long_Two_6176 t1_iu5omuu wrote

This is called model parallelism. Think of this as having model.conv1 on gpu1 and model.conv2 on gpu2. This is actually not too hard to do as you just need to manually specify your model components with statements like .to(“cuda:”). Start with this.

A more advanced model is model parallelism + data parallelism where you can benefit from having both gpus split the dataset to accelerate the training. Typically this is not possible with simple model parallelism, but an advanced model like fairseq can do it for you.