Viewing a single comment thread. View all comments

HiPattern t1_iycvpfp wrote

Write a generator that feeds the data in batches:

https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly

7

somebodyenjoy OP t1_iycwvf4 wrote

Very interesting, but I wanted the model to preprocess the data only once. This way, it’ll preprocess at every epoch

2

HiPattern t1_iycypm1 wrote

You can preprocess, then write the data into a hdf5 file, and read the preprocessed data batch wise from the hdf5 file!

7

somebodyenjoy OP t1_iycyur2 wrote

I do the same using numpy files, but they only let me load the whole data which is too big in the first place. Tensorflow let’s us load in batches huh, I’ll look into this

2

HiPattern t1_iyd91t0 wrote

hdf5 files are quite nice for that. You can write your X / y datasets in chunks into the file. When you access a batch, then it will only read the part of the hdf5 file where the batch is.

​

You can also use multiple numpy files, e.g. one for each batch, and then handle the file management in the sequence generator.

3

somebodyenjoy OP t1_iydqmu0 wrote

This is perfect, I won’t have to invest in additional RAM. Thanks for the tip!

3