big_dog_2k OP t1_iuaw55q wrote on October 29, 2022 at 10:35 PM

Reply to comment by pommedeterresautee in [D] How to get the fastest PyTorch inference and what is the "best" model serving framework? by big_dog_2k

Great. I might try this out as I like the direction this is going plus it seems like Pytorch is heading in a similar way. I'll let you know if I have questions or I will raise them on github. I appreciate all the information!

big_dog_2k OP t1_iuaexff wrote on October 29, 2022 at 8:28 PM

Reply to comment by pommedeterresautee in [D] How to get the fastest PyTorch inference and what is the "best" model serving framework? by big_dog_2k

Thank you! I think I will try kernl today as well. If I understand correctly, only Ampere generation cards are supported? Also, does it work on any huggingface model or are there still exceptions?

big_dog_2k OP t1_iua8wgp wrote on October 29, 2022 at 7:45 PM

Reply to comment by pommedeterresautee in [D] How to get the fastest PyTorch inference and what is the "best" model serving framework? by big_dog_2k

Thanks. I was aware of this and had some difficultly in the past. Evaluation criteria now compares precision loss across model outputs as well as the performance (accuracy or equivalent) measured on the full system. What methods have you found to mitigate this? I would love to know!

big_dog_2k OP t1_iua8imm wrote on October 29, 2022 at 7:42 PM

Reply to comment by braintampon in [D] How to get the fastest PyTorch inference and what is the "best" model serving framework? by big_dog_2k

Great! Exactly this, I just want someone to provide feedback. Do you see throughout improvements using bento with dynamic batching vs without? Is the throughout good in general or is the biggest benefit ease of use?

big_dog_2k OP t1_iu8gxg1 wrote on October 29, 2022 at 11:18 AM

Reply to comment by BestSentence4868 in [D] How to get the fastest PyTorch inference and what is the "best" model serving framework? by big_dog_2k

Wow! I did not know that! I think I have answers to my questions now.

big_dog_2k OP t1_iu8gcr1 wrote on October 29, 2022 at 11:11 AM

Reply to comment by BestSentence4868 in [D] How to get the fastest PyTorch inference and what is the "best" model serving framework? by big_dog_2k

Great! Does Triton allow something like native pytorch models? Or is it onnx, tensorRT, torchscript?

big_dog_2k OP t1_iu86mb7 wrote on October 29, 2022 at 8:52 AM

Reply to comment by ibmw in [D] How to get the fastest PyTorch inference and what is the "best" model serving framework? by big_dog_2k

Thanks! It sounds like investing time in onnx and using triton is the best bet.

big_dog_2k OP t1_iu86jfr wrote on October 29, 2022 at 8:51 AM

Reply to comment by poems_4_you in [D] How to get the fastest PyTorch inference and what is the "best" model serving framework? by big_dog_2k

Thanks! I have now seen quite a consistent theme from people that Triton is worth it. I might then bite the bullet and invest more time in getting onnx conversions right.

big_dog_2k OP t1_iu7paw3 wrote on October 29, 2022 at 4:51 AM

Reply to comment by sobagood in [D] How to get the fastest PyTorch inference and what is the "best" model serving framework? by big_dog_2k

Thanks. I might need to take a closer look. I was also thinking AMD and arm based cpu. I was surprised at how good the cpu based inference can be for some models these days.

big_dog_2k OP t1_iu6yjf3 wrote on October 29, 2022 at 12:49 AM

Reply to comment by yubozhao in [D] How to get the fastest PyTorch inference and what is the "best" model serving framework? by big_dog_2k

Hi! Can you give the elevator pitch for Bento? When should I use it and for what part of my model serving problems will it solve? If you integrate with another serving solution - how much more complexity is that going to add and how are you thinking about deployment?

big_dog_2k OP t1_iu6yc7b wrote on October 29, 2022 at 12:47 AM

Reply to comment by sobagood in [D] How to get the fastest PyTorch inference and what is the "best" model serving framework? by big_dog_2k

Thanks! Does it work with non-intel chipsets and how easy have you found it to use?