[P] RWKV 14B Language Model & ChatRWKV : pure RNN (attention-free), scalable and parallelizable like Transformers Submitted by bo_peng t3_10eh2f3 on January 17, 2023 at 4:54 PM in MachineLearning 19 comments 110
femboyxx98 t1_j4vlsfj wrote on January 18, 2023 at 3:58 PM Have you compared it against modern transformer implementations e.g. with FlashAttention, which can provide 3x-5x speed up by itself? Permalink 5
Viewing a single comment thread. View all comments