Viewing a single comment thread. View all comments

one_eyed_sphinx OP t1_j7yqh5v wrote

>NVME

yeah, the GPU memory is horible bottleneck. I am trying to find ways to go around it but it doesnt seems there are too many best practices for it. is there a way to use pined memory for faster model data transfer?

1

suflaj t1_j7yr906 wrote

If GPU memory is the bottleneck then there is nothing you can viably do about that. If your GPU can't load the memory faster then you will need to get more rigs and GPUs if you want to speed up the loading in parallel.

Or you could try to quantize your models into something smaller that can fit in the memory, but then we're talking model surgery, not hardware.

2

allanmeter t1_j7ytp7i wrote

This is really good advice! Preprocessing input data for both training and inferencing is the best route to get efficiencies. Don’t feed it crazy large multidimensional dataset, try and break it up and have a look at if you can use old fashioned methods on windowing and down sampling.

Also model parameters type is important too. If you’re running fp64 then you will struggle vs a model that’s just int8. If you have mixed precision weights then you really need to think about looking at AWS Sage and get a pipeline going.

To OP, maybe you can share a little context on what models you’re looking to run? Or input data context.

1