rapist1
rapist1 t1_j5xmv9n wrote
Reply to comment by koolaidman123 in [D] Self-Supervised Contrastive Approaches that don’t use large batch size. by shingekichan1996
How do you implement the cacheing? You have to cache all the activations to do the bawards pass
rapist1 t1_j8ppons wrote
Reply to [R] RWKV-4 14B release (and ChatRWKV) - a surprisingly strong RNN Language Model by bo_peng
Could you please writeup the methods of RWKV in an arxiv paper, a standalone readme, or even a blog post format? I have read the description on the GitHub repository and it is very scattered and hard to read