VenerableSpace_
VenerableSpace_ t1_irzdgb9 wrote
Reply to [D] Looking for some critiques on recent development of machine learning by fromnighttilldawn
RemindMe! 1 month
VenerableSpace_ t1_iqorbnu wrote
Reply to comment by Lugi in [D] Focal loss - why it scales down the loss of minority class? by Lugi
Ahh I see now, its been a while since I read that paper. So they chalk it down to the interaction between alpha and the focal term. You can see how they need to use a non-intuitive value for alpha when they introduce the focal loss term in tab. 1b. especially when gamma > 0.5
VenerableSpace_ t1_iqocr2s wrote
Reply to comment by Lugi in [D] Focal loss - why it scales down the loss of minority class? by Lugi
the alpha term uses inverse class freq to downweight the loss. So if there is 3:1 ratio of majority:minority, alpha_majority = 0.25 and alpha_minority = 0.75.
VenerableSpace_ t1_iqo5opu wrote
Focal loss downweights "well-classified" examples. It happens that the minority class typically is not well classified because in a given mini-batch the average gradient will be dominated by the majority class.
Technically focal loss downweights losses for all examples, it just happens to downweight the loss of well classified examples significantly more than non-well classified examples (I'm using this distinction between the two but its a smoother downweighting).
VenerableSpace_ t1_ixxy2dy wrote
Reply to comment by anonymousTestPoster in [P] Metric learning: theory, practice, code examples by Zestyclose-Check-751
Metric learning isnt really a new buzz word, its been in use for these types of approaches for several years now. Its a good framework to collectively think about these approaches but there is some overlap; eg. self-attention can be viewed as a form of metric learning as a stand-alone layer, eg. in ViT How to relate the input patch embeddings to one another s.t we can discriminate between the classes?