Submitted by bo_peng t3_1135aew in MachineLearning
farmingvillein t1_j8qj1u7 wrote
Reply to comment by bo_peng in [R] RWKV-4 14B release (and ChatRWKV) - a surprisingly strong RNN Language Model by bo_peng
> RWKV is the exception. When you look at loss against token position, it is comparable with transformers.
Can you link to what you are referring to? If I missed it in the OP post, my apologies.
Viewing a single comment thread. View all comments