Viewing a single comment thread. View all comments

suflaj t1_iqt971l wrote

Ah sorry, based on your responses I was convinced you were reading papers so my response might have been overly aggressive due on the incredibly negative experience I have had while reading relevant DL papers. It truly feels like the only difference between SOTA and a garbage paper is that SOTA somehow got to work on a specific machine, specific setup and specific training run. And this spills into whole of DL.

Hopefully you will not have the misfortune of trying to replicate some of the papers that either don't have a repo linked or which are not maintained by a large corporation, you might understand better what I meant.

12

029187 OP t1_iqtcofx wrote

>Ah sorry, based on your responses I was convinced you were reading papers so my response might have been overly aggressive due on the incredibly negative experience I have had while reading relevant DL papers.

Its all good. I'm happy to hear your thoughts.

I've read some papers but I'm by no means an expert. Ironically I've actually used ML in a professional setting, but most of my work is very much "let's run some models and use the most accurate one". Generally squeezing an extra percent via SOTA models is not worth it, so I don't deal with them much.

I do try to keep up to date with latest models, but it all seems so trial-and-error, which I think is what you were getting at.

In addition, there is a lot of incorrect theory out there which makes it even harder for amateurs or semi-pros like me. I still see videos on YouTube to this day claiming DNNs are effective because they are universal approximators, which is clearly not the reason, since there are tons of universal approximator models besides DNNs that cannot be trained as effectively on problem sets like image recognition or NLP. Universal Approximation is likely necessary but almost certainly not sufficient.

I've been reading papers like the lotter-ticket-hypothesis which seem like they are trying to give some insight into why DNNs are a useful architecture, as well as Google's follow-up paper about rigging the lottery.

Those papers have gotten me pretty interested into reading up on why these models work so well, but it seems that when you look into it the results are as you've said, it's a lot of trial and error without much of a theoretical underpinning. Of course, I'm not expert, so I don't want to poo poo the work that a lot of very smart and experienced folks are doing.

5