[D] Faster Flan-T5 inference Submitted by _learn_faster_ t3_1194vcc on February 22, 2023 at 4:59 PM in MachineLearning 8 comments 8
LetterRip t1_j9ker51 wrote on February 22, 2023 at 5:03 PM See this tutorial - converts to ONXX CPU, then to tensor-RT for a 3-6x speedup. https://developer.nvidia.com/blog/optimizing-t5-and-gpt-2-for-real-time-inference-with-tensorrt/ Permalink 6
Viewing a single comment thread. View all comments