avocadoughnut t1_j9a64k1 wrote on February 20, 2023 at 12:49 PM

Reply to comment by gliptic in [D] Large Language Models feasible to run on 32GB RAM / 8 GB VRAM / 24GB VRAM by head_robotics

Yup. I'd recommend using whichever RWKV model that can be fit with fp16/bf16. (apparently 8bit is 4x slower and lower accuracy) I've been running GPT-J on a 24GB gpu for months (longer contexts possible using accelerate) and I noticed massive speed increases when using fp16 (or bf16? don't remember) rather than 8bit.