It’s an early version and I’m trying to get some feedback on how I can improve this and do it the “right way”.

Source Code and Results: https://github.com/prabhuomkar/bitbeast/tree/master/ptibench

Comments

You must log in or register to comment.

kkchangisin t1_j5gcgbe wrote on January 22, 2023 at 8:35 PM

Nice work! Triton already looks good but have you tried optimizing with the Triton Model Analyzer?

https://github.com/triton-inference-server/model_analyzer

In various models I use with Triton I've found the output model formats and configurations for use with Triton can provide drastically increased performance whether that be throughput, latency, etc.

Hopefully I get some time soon to try it out myself!

Again, nice work!

op_prabhuomkar OP t1_j5i7oyj wrote on January 23, 2023 at 4:18 AM

Thank you for the feedback. I am looking forward to using the Triton's model analyzer possibly with different batch sizes and also FP16! Lets see how that goes :)

kkchangisin t1_j5if8hc wrote on January 23, 2023 at 5:29 AM

Depending on how much time I have there just might be a PR coming your way 😀…

Triton is really a somewhat hidden gem - the implementation and toolkit surrounding it is pretty impressive!

Late-Poet8967 t1_j5gds0h wrote on January 22, 2023 at 8:44 PM

Nice work mate. Very impressive

NovaBom8 t1_j5h30af wrote on January 22, 2023 at 11:24 PM

Very cool, great work!!

In the context of running .pt (or any other device-agnostic filetypes), I’m guessing dynamic batching is the reason for Triton’s superior throughout?

kkchangisin t1_j5ijvdy wrote on January 23, 2023 at 6:19 AM

Looking at the model configs in the repo there’s definitely dynamic batching going on.

I think what’s really interesting is the fact that even with default parameters for dynamic batching the response times are superior and very consistent.

Ok_Two6167 t1_j5jrd8u wrote on January 23, 2023 at 2:39 PM

Hello u/op_prabhuomkar,

That's a super cool test! any chance you can compare it to the HTTP API as well?

op_prabhuomkar OP t1_j5k0h1j wrote on January 23, 2023 at 3:43 PM

It’s actually easier to do for HTTP, will probably take that as a TODO. Thanks for the suggestion!