Submitted by skypilotucb t3_yxui76 in MachineLearning
skypilotucb OP t1_iwvqqgj wrote
Reply to comment by Fast-for-a-starfish in [P] SkyPilot: ML on any cloud with massive cost savings by skypilotucb
Thanks for your question! Training BERT with SkyPilot's managed spot feature cost $18.4 and took 21 hours. Running the same job with on-demand AWS instances cost $61.2 (>3x more) and took 20 hours.
Note that both jobs were run on the same GPU (V100) and the cost and time taken by SkyPilot includes the data transfer costs for moving checkpoints and all overheads associated with restarting jobs.
Viewing a single comment thread. View all comments