Viewing a single comment thread. View all comments

DeepDeeperRIPgradien t1_j9eo5uw wrote

Can you recommend a tutorial or something that explains the steps to move from (e.g. pytorch) training on your own machine to training that model in the Cloud (e.g. AWS)? What type of instances to chose, how/where to store data, making sure Nvidia/CUDA stuff is working properly, etc.?

1

I_will_delete_myself OP t1_j9fodao wrote

>Can you recommend a tutorial or something that explains the steps to move from (e.g. pytorch) training on your own machine to training that model in the Cloud (e.g. AWS)?

Same as running on your own machine.

>What type of instances to chose, how/where to store data, making sure Nvidia/CUDA stuff is working properly, etc.?

Just look up a EC2 or VM that has the gpu you want and there you go. nvidia-smi is the command that should tell you the gpu you have. It's working if it outputs the GPU you have. I would suggest checking in the code if CUDA is running.

I prefer to use a EC2 or VM because it's normally cheaper, but you have to do your own research on pricing. Cloud is a competitive market, so there is always someone ready to offer a A100 at a cheaper price. Lambda Cloud I heard was super cheap for on demand.

1