patient_zer00

patient_zer00 t1_iujl1if wrote

Disc IO is often a bootleneck.

Also, even though using a GPU will increase training speed with LSTMs, too, the computation of the gradient relies on the whole sequence to be processed each sequence step after the other, which can't be parallelized. That's probably why your speed increase is not that big using a K80 vs a A100.

Edit: typos

4