120pi t1_j10c3gq wrote on December 20, 2022 at 7:12 PM

t2.micro instances are great for lightweight applications but do not have GPUs attached. Look for the g4-, g5-, and px-series servers and use one of the DL AMIs. Single GPU servers should work for starters, then you can simply upgrade the instance if you need parallel compute.

Something to consider as well, you're only charged for wall time on these servers when they're up so do as much of your development locally, then spin up the instance, pull your repo into your EC2, train, export model/checkpoints to S3, shut down EC2 (this can all be done in a script).

I bring this up because the cost seems high for a 24h usage cycle, but the A10 (g5) and above are really powerful and may actually be cheaper due to reduced training times.

Breaking even on a 4090 would take around 6-9 months at 8h/day using a single-GPU g5. Personally, I'm grabbing a 4070Ti when it comes out since I'm not so concerned with training times (12G is "ok") for personal projects and I don't want to upgrade my PSU.