Ulfgardleo t1_irutjd4 wrote on October 11, 2022 at 5:09 AM

A hugely underappreciated fact is the computational difficulty behind learning with weak labels. E.g., if only coarse/group labels are available, multi-class linear classification becomes immediately np-hard.

gradientrun t1_iruw042 wrote on October 11, 2022 at 5:39 AM

Is this a result from some theory paper ?

Ulfgardleo t1_irv9w00 wrote on October 11, 2022 at 8:58 AM

quite easy to proof.

take a multi-class classification problem. Now, pick one class and assign it label 0, assign all other classes the same coarse label 1 and try to find the maximum margin classifier. This problem is equivalent to finding a convex polytope that separates class 0 from class 1 with maximum margin. This is an NP-hard problem. Logistic regression is not much better, but more difficult to proof.

This is already NP-complete when the coarse label encompasses two classes: https://proceedings.neurips.cc/paper/2018/file/22b1f2e0983160db6f7bb9f62f4dbb39-Paper.pdf

ratatouille_artist OP t1_irvdebw wrote on October 11, 2022 at 9:51 AM

Very interesting perspective around the difficulty of learning weak labels. If I have time would be good to do a longer form write up around how effective skweak is for span extraction with it's hidden markov model approach for span extraction.