Submitted by head_robotics t3_1172jrs in MachineLearning
gliptic t1_j99y0cp wrote
RWKV can run on very little VRAM with Rwkvstic streaming and 8-bit. I've not tested streaming, but I expect it's a lot slower. 7B model sadly takes 8 GB with just 8-bit quantization.
avocadoughnut t1_j9a64k1 wrote
Yup. I'd recommend using whichever RWKV model that can be fit with fp16/bf16. (apparently 8bit is 4x slower and lower accuracy) I've been running GPT-J on a 24GB gpu for months (longer contexts possible using accelerate) and I noticed massive speed increases when using fp16 (or bf16? don't remember) rather than 8bit.
hummingairtime t1_j9ey0bz wrote
I appreciate you
Viewing a single comment thread. View all comments