SnooPears7079 t1_jcaxru5 wrote on March 15, 2023 at 3:21 PM

Would you say “coming up with architectures that randomly work / don’t work” is a shortcoming of your understanding or of the field in general?

I’m asking because I’m thinking about doing the opposite switch right now - ML interests me deeply and I’m currently in standard cloud development.

pwsiegel t1_jcbjf82 wrote on March 15, 2023 at 5:35 PM

It's a property of the field in general - there is very little theory to guide neural architecture design, just some heuristics backed by trial-and-error experimentation. Deep learning models are fun, but in practice you spend a lot of your time trying to trick gradient descent into converging faster.

currentscurrents t1_jcclq02 wrote on March 15, 2023 at 9:32 PM

The whole thing seems very bitter lesson-y and I suspect in the future we'll have a very general architecture that learns to reconfigure itself for the data.

nopainnogain5 OP t1_jccd7vm wrote on March 15, 2023 at 8:38 PM

I was wondering if this has something to do with lack of experience. And from what I've heard indeed the more you experiment with the models, the better you understand what helps when, to some extent.

The thing is, a neural network still remains a black box, as the number of parameters is too big to fully understand what is happening. It is an empirical study mostly. So you choose your architecture, test, change hyperparameters, test, change the architecture, test, change some other parameters, test, and so on. You can't be sure your model will work properly right away and it may take lots of iterations. With larger models which take long to train it might be overwhelming.

Of course, it might be different in your case, you can start with some toy examples, and if you still like it, I'd recommend playing with larger networks.

loadage t1_jccdzk2 wrote on March 15, 2023 at 8:43 PM

That was my first thought too. I'm about to finish my masters program and I spent the first half thinking that it was just hyperparameter tuning, until I sat down and learned the math and theory. Now it's so much more interesting and explainable. That random tuning is now much more calibrated from experience and understanding the theory. (As of now), I could easily make a career out of this, because it's not random and simple optimization. Plus, the field is so hot right now, that it's unreasonable to assume that what data scientists do now is what they will do in 5, 10, or 20 years