Comments

You must log in or register to comment.

harharveryfunny t1_ir7b7aw wrote

If model load time is the limiting factor then ONNX runtime speed may be irrelevant. You may need to load the model once and reuse it, rather than loading each time.

There's a new runtime (TensorRT competitor) called TemplateAI available from Facebook, that does support CPU and is meant to be very fast, but I don't believe they yet support ONNX, and anyways you're not going to get a 50x speed-up just by switching to a faster runtime on same hardware,

Another alternative might be to run it in the cloud rather than locally.

3

StephaneCharette t1_ir7nomj wrote

I cannot help but think, "oh yeah, this framework over here is 50x faster than anything else, but everyone has forgotten about it until just now..."

If <something> gave 50X improvements, wouldn't that be what everyone uses?

Having said that, the reason I use Darknet/YOLO is specifically because the whole thing compiles to a C++ library. DLL on Windows, and .a or .so on Linux. I can squeeze out a few more FPS by using the OpenCV implementation instead of Darknet directly, but the implementation is not trivial to use correctly.

However, if you're working with ONNX then I suspect you're already achieving speeds higher than using Darknet or OpenCV as the framework.

One thing to remember: resizing images (aka video frames) is SLOWER than inference. I don't know what your pytorch and onnx frameworks do when the input image is larger than the network, but when I take timing measurements with Darknet/YOLO and OpenCV's DNN, I end up spending more time resizing the video frames than I do in inference. This is a BIG deal, which most people ignore or trivialize. If you can size your network correctly, or you can adjust the video capture to avoid resizing, you'll likely more than double your FPS. See these performance numbers for example: https://www.ccoderun.ca/programming/2021-10-16_darknet_fps/#resize

1

LiquidDinosaurs69 t1_ir7qovy wrote

Simply copy and paste the weights and biases into vectors in C++ and do the math yourself for inference. Unless your network is very big I believe this actually a pretty valid strategy

1

LiquidDinosaurs69 t1_ir8v1nq wrote

No I didn’t measure the time. But I had a network that had 2 hidden layers with 35 units per layer and I was using it as a component of a single threaded simulation that was running inference over 1000 times a second on an older CPU. Can I ask why you don’t want to use the gpu? Cuda would speed things up a lot if you need more speed.

1

LiquidDinosaurs69 t1_ir8vdvj wrote

Actually, here’s the code where I implemented inference for my neural net if you’re interested. It’s very simple. https://github.com/jyurkanin/auvsl_dynamics/blob/float_model/src/TireNetwork.cpp

And here’s a handy script I made to help generate the c code for loading the weights into libeigen vectors. (Just use the print_c_network function) https://github.com/jyurkanin/auvsl_dynamics/blob/float_model/scripts/pretrain.py

Also look at my cmakelists.txt to make sure you had the compiler flags that will make your code run as fast as possible

1

amitraderinthemaking OP t1_ir8zuij wrote

Ah thank you SO much for sharing I will definitely take a look!

So unfortunately we don't have GPU available on our production systems yet -- we are not an ML oriented team at all (this would be the first project tbh).

But we'd eventually make a case for GPU for certain. Thing is, this method (with ML) should be faster than the current way of doing things before we can move further, you know.

Thanks again for sharing.

2