ArmagedonAshhole
ArmagedonAshhole t1_j99tr0r wrote
Reply to comment by Disastrous_Elk_6375 in [D] Large Language Models feasible to run on 32GB RAM / 8 GB VRAM / 24GB VRAM by head_robotics
>GPT-NeoX should fit in 24GB VRAM with 8bit, for inference.
GPT-NeoX20B It will fit in 24GB vram but it will almost instantly go out of memory when context will get a bit bigger than starting page of sentences.
ArmagedonAshhole t1_j9a1vq3 wrote
Reply to comment by Disastrous_Elk_6375 in [D] Large Language Models feasible to run on 32GB RAM / 8 GB VRAM / 24GB VRAM by head_robotics
it depends mostly on settings so no.
Small context like 200-300 tokens could work with 24GB but then your AI will not remember and connect dots well which would make model worse than 13B
People are working right now on spliting work between gpu(vram) and cpu(ram) in 8bit mode. I think like 10% to RAM would make model work well on 24GB vram card. IT would be a bit slower but still usable.
If you want you can always load whole model to ram and run it via cpu but it is very slow.