Viewing a single comment thread. View all comments

VenerableSpace_ t1_iqo5opu wrote

Focal loss downweights "well-classified" examples. It happens that the minority class typically is not well classified because in a given mini-batch the average gradient will be dominated by the majority class.

Technically focal loss downweights losses for all examples, it just happens to downweight the loss of well classified examples significantly more than non-well classified examples (I'm using this distinction between the two but its a smoother downweighting).

3

Lugi OP t1_iqoahos wrote

Yes, but I am using specifically the alpha-balanced version, which they used in a counterproductive way.

2

VenerableSpace_ t1_iqocr2s wrote

the alpha term uses inverse class freq to downweight the loss. So if there is 3:1 ratio of majority:minority, alpha_majority = 0.25 and alpha_minority = 0.75.

1

Lugi OP t1_iqodhe1 wrote

Yes, but the problem here is while they mention that in the paper, finally they use alpha of 0.25, which weighs down the minority (foreground) - while the background (majority) class has scaling of 0.75. This is what I'm concerned about.

2

VenerableSpace_ t1_iqorbnu wrote

Ahh I see now, its been a while since I read that paper. So they chalk it down to the interaction between alpha and the focal term. You can see how they need to use a non-intuitive value for alpha when they introduce the focal loss term in tab. 1b. especially when gamma > 0.5

2