big_dog_2k
big_dog_2k OP t1_iuaexff wrote
Reply to comment by pommedeterresautee in [D] How to get the fastest PyTorch inference and what is the "best" model serving framework? by big_dog_2k
Thank you! I think I will try kernl today as well. If I understand correctly, only Ampere generation cards are supported? Also, does it work on any huggingface model or are there still exceptions?
big_dog_2k OP t1_iua8wgp wrote
Reply to comment by pommedeterresautee in [D] How to get the fastest PyTorch inference and what is the "best" model serving framework? by big_dog_2k
Thanks. I was aware of this and had some difficultly in the past. Evaluation criteria now compares precision loss across model outputs as well as the performance (accuracy or equivalent) measured on the full system. What methods have you found to mitigate this? I would love to know!
big_dog_2k OP t1_iua8imm wrote
Reply to comment by braintampon in [D] How to get the fastest PyTorch inference and what is the "best" model serving framework? by big_dog_2k
Great! Exactly this, I just want someone to provide feedback. Do you see throughout improvements using bento with dynamic batching vs without? Is the throughout good in general or is the biggest benefit ease of use?
big_dog_2k OP t1_iu8gxg1 wrote
Reply to comment by BestSentence4868 in [D] How to get the fastest PyTorch inference and what is the "best" model serving framework? by big_dog_2k
Wow! I did not know that! I think I have answers to my questions now.
big_dog_2k OP t1_iu8gcr1 wrote
Reply to comment by BestSentence4868 in [D] How to get the fastest PyTorch inference and what is the "best" model serving framework? by big_dog_2k
Great! Does Triton allow something like native pytorch models? Or is it onnx, tensorRT, torchscript?
big_dog_2k OP t1_iu86mb7 wrote
Reply to comment by ibmw in [D] How to get the fastest PyTorch inference and what is the "best" model serving framework? by big_dog_2k
Thanks! It sounds like investing time in onnx and using triton is the best bet.
big_dog_2k OP t1_iu86jfr wrote
Reply to comment by poems_4_you in [D] How to get the fastest PyTorch inference and what is the "best" model serving framework? by big_dog_2k
Thanks! I have now seen quite a consistent theme from people that Triton is worth it. I might then bite the bullet and invest more time in getting onnx conversions right.
big_dog_2k OP t1_iu7paw3 wrote
Reply to comment by sobagood in [D] How to get the fastest PyTorch inference and what is the "best" model serving framework? by big_dog_2k
Thanks. I might need to take a closer look. I was also thinking AMD and arm based cpu. I was surprised at how good the cpu based inference can be for some models these days.
big_dog_2k OP t1_iu6yjf3 wrote
Reply to comment by yubozhao in [D] How to get the fastest PyTorch inference and what is the "best" model serving framework? by big_dog_2k
Hi! Can you give the elevator pitch for Bento? When should I use it and for what part of my model serving problems will it solve? If you integrate with another serving solution - how much more complexity is that going to add and how are you thinking about deployment?
big_dog_2k OP t1_iu6yc7b wrote
Reply to comment by sobagood in [D] How to get the fastest PyTorch inference and what is the "best" model serving framework? by big_dog_2k
Thanks! Does it work with non-intel chipsets and how easy have you found it to use?
big_dog_2k OP t1_iuaw55q wrote
Reply to comment by pommedeterresautee in [D] How to get the fastest PyTorch inference and what is the "best" model serving framework? by big_dog_2k
Great. I might try this out as I like the direction this is going plus it seems like Pytorch is heading in a similar way. I'll let you know if I have questions or I will raise them on github. I appreciate all the information!