Submitted by head_robotics t3_1172jrs in MachineLearning
avocadoughnut t1_j9a64k1 wrote
Reply to comment by gliptic in [D] Large Language Models feasible to run on 32GB RAM / 8 GB VRAM / 24GB VRAM by head_robotics
Yup. I'd recommend using whichever RWKV model that can be fit with fp16/bf16. (apparently 8bit is 4x slower and lower accuracy) I've been running GPT-J on a 24GB gpu for months (longer contexts possible using accelerate) and I noticed massive speed increases when using fp16 (or bf16? don't remember) rather than 8bit.
Viewing a single comment thread. View all comments