Submitted by op_prabhuomkar t3_10iqeuh in MachineLearning
NovaBom8 t1_j5h30af wrote
Very cool, great work!!
In the context of running .pt (or any other device-agnostic filetypes), I’m guessing dynamic batching is the reason for Triton’s superior throughout?
kkchangisin t1_j5ijvdy wrote
Looking at the model configs in the repo there’s definitely dynamic batching going on.
I think what’s really interesting is the fact that even with default parameters for dynamic batching the response times are superior and very consistent.
Viewing a single comment thread. View all comments