Submitted by bo_peng t3_1135aew in MachineLearning
farmingvillein t1_j8p7lci wrote
Reply to comment by csreid in [R] RWKV-4 14B release (and ChatRWKV) - a surprisingly strong RNN Language Model by bo_peng
Neither really work for super long contexts, so it is kind of a moot point.
Both--empirically--end up with bolt-on approaches to enhance memory over very long contexts, so it isn't really clear (a priori) that the RNN has a true advantage here.
Viewing a single comment thread. View all comments