Alors_HS

Alors_HS t1_is1871l wrote

Well, I needed to solve my problem so I looked at papers / software solutions. Then it was a slow process of many iterations of trials and errors.

I couldn't tell you what would be the best for your use case tho. I am afraid it's been too long for me to remember the details. Beside, each method may be more or less effective according to good results on the metrics/inference time or the method and means of training that you can afford.

I can give you a tip : I initialized my inference script only once per boot, and then put it in "waiting mode" so I wouldn't have to initialize the model for each inference (it's the largest cause of losing time). Then upon receiving a socket message, the script would read a data file, do an inference pass, write the results in an another file, delete/move the data to storage and wait for the next socket message. It's obvious when you think about it that you absolutely don't want to call/initialize your inference script once per inference, but, well, you never know what people think about :p

1

Alors_HS t1_is0f9h0 wrote

I had to deploy my models on nvidia jetson / tx for my last job, 1.5y ago.

In these use cases there is a lot of optimisation to do. A list of available methods: pruning, mixed precision training/inference, quantization, CUDA/onnx/nvidia optimization, training models that perform on lower resolution data via knowledge distillation from models that trained on higher res data...

Look around, this is on the top of my head from a bit of time ago. There is plenty of resources now for inference on the edge.

1