Viewing a single comment thread. View all comments

curiousshortguy t1_jad9s4t wrote

28

AnOnlineHandle t1_jaeshwf wrote

Is there a way to convert parameter count into vram requirements? Presuming that's the main bottleneck?

7

metal079 t1_jaeuymi wrote

Rule of thumb is vram needed = 2x per billion parameters, though I recall pygamillion which is 6B says it needs 16GB of ram so it depends.

12

curiousshortguy t1_jaf3aab wrote

Yeah, about 2-3. You can easily shove layers of the networks on disk, and then load even larger models that don't fit in vram BUT disk i/o will make inference painfully slow.

10