IdeaEnough443 OP t1_izgwvp5 wrote on December 9, 2022 at 12:45 AM

Reply to comment by ab3rratic in [D] What is the recommended approach to training NN on big data set? by IdeaEnough443

but the training process would be slower than parallelization? is batch gradient descent the industry standard for handling large dataset in nn training?

ab3rratic t1_izh1s2j wrote on December 9, 2022 at 1:24 AM

See "deep learning".

PassionatePossum t1_izi24ow wrote on December 9, 2022 at 6:45 AM

You can still parallelize using batch gradient descent. If you for example use the MirroredStrategy in Tensorflow you split up the batch between multiple GPUs. The only downside is, that this approach doesn’t scale well if you want to train on more than one machine since the model needs to be synced after each iteration.

But you should think long and hard whether training on multiple machines is really necessary since that brings a whole new set of problems. 700GB is not that large. We do that all the time. I don’t know what kind of model you are trying to train but we have a GPU Server with 8 GPUs and I’ve never felt the need to go beyond the normal MirroredStrategy for parallelization. And should you run into the problem that you cannot fit the data onto the machine where you are training: Load it over the network.

You just need to make sure that your input pipeline supports that efficiently. Shard your dataset so you can have many concurrent I/O operations.

And in case scaling is really important to you. May I suggest you look into Horovod?

SwordOfVarjo t1_izgx533 wrote on December 9, 2022 at 12:47 AM

It's the industry standard for NN training period. Your dataset isn't that big, just train on one machine.

IdeaEnough443 OP t1_izgyjq8 wrote on December 9, 2022 at 12:58 AM

our datset take close to a day to finish training, if we have 5x the data it won't work with our application, thats why we are trying to see if distributed training would help lower training time