one_eyed_sphinx

one_eyed_sphinx OP t1_j7tzoiq wrote

>eco

so this is the fine point that I want to understand, what I am trying to optimize with the build is the data transfer time, how much time it takes to load a model from RAM to VRAM. if I have10 models that need 16 GB of VRAM to run, the need to share resources. so I want to "memory hot swap" (I don't know if there is a proper term for it, I found "Bin packing") the models on an incoming request. so the data transfer is somewhat critical in my point of view and as I understand it, only the PCI speed is the bottleneck here, correct me if I'm wrong.

1