Viewing a single comment thread. View all comments

pyepyepie t1_iz0wj9m wrote

Super interesting. I like the story about the metrics, very useful for people who are new to data science. Even when it's not solvable (I assume in your case it was but in MARL, for example, sometimes if you just aim for Pareto optimality you get a weird division of "goods") you would rather have two models with x-5% accuracy than 1 model with x+15 and another with x-15 most of the time. We get money to know the systems we build :)

BTW, what you talk about seems related to this https://openai.com/blog/deep-double-descent/ (deep double descent). That phenomenon is clearly magic :D I have heard some explanations about weight initialization at a conference but to be honest I really don't have anything intelligent to say about it, would be interesting to see if it's the standrad type of networks in 20 years.

1

trnka t1_iz1nk60 wrote

Oh interesting paper - I haven't seen that paper before.

For what it's worth, I haven't observed double-descent personally, though I suppose I'd only notice it for sure with training time. We almost always had typical learning curves with epochs - training loss decreases smoothly as expected, and testing loss hits a bottom then starts climbing unless there's a TON of regularization.

We probably would've seen it with the number of model parameters cause we did random searches on those periodically and graphed the correlations. I only remember seeing one peak on those, though we generally didn't evaluate beyond 2x the number of params of our most recent best.

I probably wouldn't have observed the effect with more data because our distribution shifted over the years, for instance in 2020 we got a lot more respiratory infections coming in due to COVID which temporarily decreased numbers then increased them because it's easier to guess than other things.

2