Viewing a single comment thread. View all comments

Ford_O t1_iwtrw98 wrote

How much faster is RNN on inference than GPTJ?

2

bo_peng OP t1_iwts867 wrote

RWKV-3 1.5B on A40 (tf32) = always 0.015 sec/token, tested using simple pytorch code (no CUDA), GPU utilization 45%, VRAM 7823M

GPT2-XL 1.3B on A40 (tf32) = 0.032 sec/token (for ctxlen 1000), tested using HF, GPU utilization 45% too (interesting), VRAM 9655M

Moreover RWKV-4 is bf16 and faster than 16bit GPT models.

Training speed: RWKV-4 1.5B BF16 ctxlen1024 = 106K tokens/s on 8xA100 40G.

8

Ford_O t1_iwtx6nb wrote

Could you also measure the performance on CPU?

3