Screye t1_izauw5w wrote on December 7, 2022 at 7:27 PM

Reply to comment by Tejas_Garhewal in [D] If you had to pick 10-20 significant papers that summarize the research trajectory of AI from the past 100 years what would they be by versaceblues

He is the UIUC of Deep learning's mount rushmore.

Just as people think of Stanford, MIT, CMU, Berkley as the big CS universities and forget that UIUC is almost just as good.....people take the names of Hinton, LeCun, Bengio and forget that Schmidhuber(' lab) did a lot of important foundational work in deep learning.

Sadly, he is a curmudgeon who complains a lot and claims even more than he has actually achieved.....so people have kind of soured on him lately.

undefdev t1_izbjvcl wrote on December 7, 2022 at 10:12 PM

> Sadly, he is a curmudgeon who complains a lot and claims even more than he has actually achieved.....so people have kind of soured on him lately.

What did he claim that he didn't achieve? I didn't dig too deeply into it, but it always seemed to me that his complaints haven't been addressed, but nobody has an incentive to support him.

JustOneAvailableName t1_izbnfki wrote on December 7, 2022 at 10:37 PM

> What did he claim that he didn't achieve?

Connections to his work are often vague. Yes, his lab tried something in the same extremely general direction. No, his lab did not show it actually worked or what part of the broad direction they went in actually worked. So I am not gonna cite Fast Weight Programmers when I want to write about transformers. Yes, Fast Weight Programmers also argued there are more ways to handle variable sized input than using RNNs. No, I don't think the idea is special at all. The main point of Attention is all you need was that removing something of the then mainstream architecture made it faster (or larger) to train while keeping the quality. It was the timing that made it special, because it successfully went against mainstream and they made it work, not the idea itself.

undefdev t1_izbui6y wrote on December 7, 2022 at 11:27 PM

> So I am not gonna cite Fast Weight Programmers when I want to write about transformers.

I think you are probably refering to this paper: Linear Transformers Are Secretly Fast Weight Programmers

It seems like they showed that linear transformers are equivalent to fast weight programmers. If linear transformers are relevant to your research, why not cite fast weight programmers? Credit is cheap, right? We can still call them linear transformers.

JustOneAvailableName t1_izbzbaq wrote on December 8, 2022 at 12:02 AM

Because Schmidhuber claiming that transformers are based on his work was a meme for 3-4 years before he actually did that. Like here.

There are hundreds more relevant papers to cite and read about (linear scaling) transformers

undefdev t1_izc3tr1 wrote on December 8, 2022 at 12:35 AM

> Because Schmidhuber claiming that transformers are based on his work was a meme for 3-4 years before he actually did that. Like here.

But why should memes be relevant in science? Not citing someone because there are memes around their person seems kind of arbitrary. If it's just memes, maybe we shouldn't take them too seriously.