Submitted by somebodyenjoy t3_z8otan in deeplearning
I was thinking I'd load half the data first, train it, then another half, and train it. This may be slightly slower but should work in theory. I'd preprocess it and store the data in something like X1.npy and X2.npy. X1 and X2 being the first and second half of the preprocessed data. This can make it so data is loaded much quicker as well, but obviously slower than if we had bigger RAM. We can always get more RAM in the cloud, but what if we have 1000GB of images to train on? Seems like my initial intuition is correct, but what is the standard operating procedure here?
​
I think people normally let Keras do all the work by simply using ImageDataGenerator and feeding the path, but what if I want some control over preprocessing?
Alone_Bee_6221 t1_iycimeo wrote
I would probably suggest splitting into chunks of data or you could try to implement you own dataset class to load images lazily.