gliptic t1_j99y0cp wrote on February 20, 2023 at 11:12 AM

RWKV can run on very little VRAM with Rwkvstic streaming and 8-bit. I've not tested streaming, but I expect it's a lot slower. 7B model sadly takes 8 GB with just 8-bit quantization.

avocadoughnut t1_j9a64k1 wrote on February 20, 2023 at 12:49 PM

Yup. I'd recommend using whichever RWKV model that can be fit with fp16/bf16. (apparently 8bit is 4x slower and lower accuracy) I've been running GPT-J on a 24GB gpu for months (longer contexts possible using accelerate) and I noticed massive speed increases when using fp16 (or bf16? don't remember) rather than 8bit.

hummingairtime t1_j9ey0bz wrote on February 21, 2023 at 12:56 PM

I appreciate you