Viewing a single comment thread. View all comments

RoaRene317 t1_jdfnzna wrote

My suggestion is using 8 bit or 4 bit quantization. Also you can using automatic device mapping on Transformers that can offload partially to your CPU (warning : It use lots of System Memory [RAM]).

3