suflaj t1_j7u2qyt wrote on February 9, 2023 at 12:25 PM

Reply to comment by one_eyed_sphinx in what cpu and mother board is best for Dual RTX 4090? by one_eyed_sphinx

You want eco mode to run cooler and more efficient. As I said, the bottleneck is in the GPU, specifically its memory bandwidth, not in whatever the CPU can transfer. Modern CPUs can easily handle 3 high end GPUs at the same time, not just 2.

PCI speed has not been a bottleneck for several years, and will probably never be a bottleneck again with this form factor of GPUs. The GPU MEMORY is the bottleneck nowadays.

EDIT: And as someone else has said, yeah, you can use fast NVMEs as swap to avoid loading from disk. There used to be Optane for this kind of stuff, but well, that's dead.

one_eyed_sphinx OP t1_j7yqh5v wrote on February 10, 2023 at 10:34 AM

>NVME

yeah, the GPU memory is horible bottleneck. I am trying to find ways to go around it but it doesnt seems there are too many best practices for it. is there a way to use pined memory for faster model data transfer?

suflaj t1_j7yr906 wrote on February 10, 2023 at 10:45 AM

If GPU memory is the bottleneck then there is nothing you can viably do about that. If your GPU can't load the memory faster then you will need to get more rigs and GPUs if you want to speed up the loading in parallel.

Or you could try to quantize your models into something smaller that can fit in the memory, but then we're talking model surgery, not hardware.

allanmeter t1_j7ytp7i wrote on February 10, 2023 at 11:17 AM

This is really good advice! Preprocessing input data for both training and inferencing is the best route to get efficiencies. Don’t feed it crazy large multidimensional dataset, try and break it up and have a look at if you can use old fashioned methods on windowing and down sampling.

Also model parameters type is important too. If you’re running fp64 then you will struggle vs a model that’s just int8. If you have mixed precision weights then you really need to think about looking at AWS Sage and get a pipeline going.

To OP, maybe you can share a little context on what models you’re looking to run? Or input data context.