Viewing a single comment thread. View all comments

guillaumekln t1_j9nfl9t wrote

You can also check out the CTranslate2 library which supports efficient inference of T5 models, including 8-bit quantization on CPU and GPU. There is a usage example in the documentation.

Disclaimer: I’m the author of CTranslate2.

5