HiPattern t1_j8naxdx wrote on February 15, 2023 at 3:57 PM

Reply to [P] From “iron manual” to “Iron Man” — Augmenting GPT for fast editable memory to enable context aware question & answering by skeltzyboiii

Very nice! What runs in the docker service?

HiPattern t1_iyd91t0 wrote on November 30, 2022 at 3:24 PM

Reply to comment by somebodyenjoy in If the dataset is too big to fit into your RAM, but you still wish to train, how do you do it? by somebodyenjoy

hdf5 files are quite nice for that. You can write your X / y datasets in chunks into the file. When you access a batch, then it will only read the part of the hdf5 file where the batch is.

You can also use multiple numpy files, e.g. one for each batch, and then handle the file management in the sequence generator.

HiPattern t1_iycypm1 wrote on November 30, 2022 at 2:07 PM

Reply to comment by somebodyenjoy in If the dataset is too big to fit into your RAM, but you still wish to train, how do you do it? by somebodyenjoy

You can preprocess, then write the data into a hdf5 file, and read the preprocessed data batch wise from the hdf5 file!

HiPattern t1_iycvpfp wrote on November 30, 2022 at 1:42 PM

Reply to If the dataset is too big to fit into your RAM, but you still wish to train, how do you do it? by somebodyenjoy

Write a generator that feeds the data in batches:

https://stanford.edu/~shervine/blog/keras-how-to-generate-data-on-the-fly