Viewing a single comment thread. View all comments

Dartagnjan OP t1_j103e5a wrote

Yes, I already have batch_size=1. I am looking to sharding the model on multiple GPUs now. In my case, not being able to predict on the 1% of super hard examples means that those examples have features that the model has not learned to understand yet. The labeling is very close to perfect with mathematically proven error bounds...

> focal loss, hard-example mining

I think these are exactly the keywords that I was missing in my search.

5

dumbmachines t1_j133fcs wrote

If focal loss is interesting, check out polyloss, which is a generalization of the focal loss idea.

2