sobagood t1_iu6zuhk wrote

If you mean nvidia gpu, it has cuda plugin to run it on nvidia gpu but i have never tried. It has several other plugins so you could check it out. It also provides its own deploy server. Nvidia triton also supports openvino runtime without gpu support with an obvious reason. They have similar process like onnx that transform graph to their intermediate representation with ‘model optimizer’ which could go wrong. If you could successfully create this representation, there should be no new bottleneck.