pommedeterresautee OP t1_iu2vg8y wrote
Reply to comment by seek_it in [P] Up to 12X faster GPU inference on Bert, T5 and other transformers with OpenAI Triton kernels by pommedeterresautee
Right now the kernels cover linear layer, attention, and layer norm / rms norm. So the effect would be limited outside a transformer or assimilated. However we will increase the number of kernels, but convolution is not right now our priority
Viewing a single comment thread. View all comments