gwern t1_j8s2du5 wrote on February 16, 2023 at 3:35 PM

Reply to comment by farmingvillein in [R] RWKV-4 14B release (and ChatRWKV) - a surprisingly strong RNN Language Model by bo_peng

> There is no possible way that you actually read the Related Works section you dismissed, given that the papers you cited are already covered in the same references you dismissed.

Telling someone to read the Related Works section of every one of a dozen papers in the Related Works section of a paper is a ridiculous thing to suggest, and no, I did not recurse down n deep in a breadth-first search. I read the Related Works of that paper, as I said ("I don't think the Related Works section of that paper"), noted that they were a bunch of memory-related papers which might or might not cite the actually relevant research I had in mind, but life was too short to queue up a dozen papers just to check their RW when I already knew some useful ones. Giving someone a random reference and telling them to manually crawl the literature is not helpful. In contrast, the two references I provided directly bore on the question, they didn't maybe cite papers which might bury something relevant in a footnote or cite papers which might someday answer the question...

> I never said this, so I'm not sure what your argument is.

I was pointing out why it was irrelevant to bring up a paper which "compares w/ and w/o memory." Mildly interesting but such a comparison cannot show what was asked about the effective memory of RNNs. Of course it is better to have (any) memory than not.

> which, among other things, is why I reference Dai et al, who (among others!) do a fairly extensive breakdown of empirical performance differences of RNNs- versus transformer-type architectures against long text sequences.

Dai would in fact have been useful, had you referenced it in your original comment. Unless you mean, 'vaguely gestured in the direction of a paper which has 50+ references with 35 in the RW section alone, any of which could have been relevant and where the relevant benchmarking of Dai was not highlighted in the paper to begin with, nor is the relative context work mentioned in the abstract of Dai but buried at the end of the paper (with the RNN results hidden inside a table) so you just have to know it's already there, and claimed you 'reference it'.' Then sure, yeah, that was a useful reference. Thanks for the input.

> If your claim is that the papers indicated that RNNs have a small window (sure) and that Transformers have a longer one, you're arguing (as you seem to be in your entire post) again against a strawman.

It's not a strawman. It's not obvious a priori that Transformers would work so much better or that RNN histories fade out so fast, which is why it had to be empirically established that the history fades out completely, as opposed to any of the other reasons that RNNs could underperform (maybe they have history but can't learn a good algorithm exploiting their memory, say, or they could but they are poorly optimized - there are so many ways for NNs to break) and people were surprised by how well Transformers work. It is completely understandable that OP would expect RNN history to work better than it does, and would want some hard citeable evidence that it works so badly that Transformers, with their apparently brutal hard cutoff, wind up having much closer to 'infinite context' than RNNs themselves.

Thus, it's useful to provide references showing that. (Not references to unspecified references which may or may not show that - gl.)

farmingvillein t1_j8s7ygo wrote on February 16, 2023 at 4:12 PM

This...is pretty astounding. Just have the grace to admit you were wrong, and move on.

> Telling someone to read the Related Works section of every one of a dozen papers in the Related Works section of a paper is a ridiculous thing to suggest

Then how can you possibly say:

> I don't think the Related Works section of that paper provides any useful references.

This is hardcore trolling. You can, and frequently do, do better than this.

You are literally pushing posts that are factually incorrect, and that you either know are factually incorrect, or are too lazy to validate either way.

This is the type of thing which blows up post quality in this sub.

> Giving someone a random reference and telling them to manually crawl the literature is not helpful.

This...is ridiculous. This is--traditionally--a very academic-friendly sub. This is how research works. "Here is where you can start a literature review on a bundle of related papers" is an extremely classic response which is generally considered helpful to complex and nuanced questions.

And underlying issue is actually very complex, as evidenced in part by the fact that your references do not actually answer the question. "Go read related works" can be obnoxious when there are a single one or two papers that do answer the question--but that is not the case here.

> In contrast, the two references I provided directly bore on the question

No they did not. They did not touch at all upon Transformers versus RNNs, which was the question. You've chosen to cherry-pick one slice of the problem and declare victory.

> It's not a strawman.

You don't seem to understand what a strawman is. Strawman:

> an intentionally misrepresented proposition that is set up because it is easier to defeat than an opponent's real argument.

I was not making this argument. You were making this argument. QED, this a strawman.