I had the pleasure of running a workshop on weak supervision for NLP recently. I would like to hear more about what are your experiences with using weak supervision for NLP?

I am a huge of weak supervision personally, I think skweak is a great tool for span based weak supervision.

With simple and efficient out-of-the-box machine learning APIs finetuning and deploying machine learning models has never been easier. The lack of labelled data is a real bottleneck for most projects. Weak supervision can help:

labelling data more efficiently
generating noisy labelled data to finetune your model on

Benefits of weak supervision

Here's an example skweak labelling function to generate noisy labelled data:

from skweak.base import SpanAggregator

class MoneyDetector(SpanAggregator):
    def __init__(self):
        super(MoneyDetector, self).__init__("money_detector")

    def find_spans(self, doc):
        for tok in doc[1:]:
            if tok.text[0].isdigit() and tok.nbor(-1).is_currency:
                yield tok.i-1, tok.i+1, "MONEY"

money_detector = MoneyDetector()

This labelling function extracts any digits that are preceded by a currency.

Example of labelling function in action

skweak allows you to combine multiple labelling functions using spacy attributes or other methods.

Using labelling functions has a number of advantages:

💪 larger coverage, a single labelling function can cover many samples
🤓 involving experts, domain expert annotation is expensive, domain expert labelling functions are more economical due to coverage
🌬️ adopting to changing domains, labelling functions and data assets can be adapted to changing domains

What are your experiences with weak supervision in NLP? I really recommend trying out skweak in particular if you work with span extraction.

Comments

Ulfgardleo t1_irtdtoj wrote on October 10, 2022 at 10:03 PM

This feels and sounds like an add. But i could not find out for what. maybe you should make it clear which product i should definitely use.

ratatouille_artist OP t1_irtg348 wrote on October 10, 2022 at 10:20 PM

Point taken about the advert style writing, thanks for the feedback. My goal with the post is seeing what others do for weak supervision for NLP. I also think it's an underappreciated topic and would like to see more discussions around it.

Empty-Painter-3868 t1_irtdww2 wrote on October 10, 2022 at 10:04 PM

Great question. In practice, I spend a week crafting a 'good' weak dataset. The result is a modest performance gain, and the model becomes a lot more unpredictable (spans off by a token or so).

The correct answer nobody wants to hear is: "I should have spent a week labelling data"

Forget Snorkel and all that crap. It's harder to make good labelling functions than it is to label data, IMO

Seankala t1_iru9kdm wrote on October 11, 2022 at 2:09 AM

I second forgetting about Snorkel and the like. I found it better for me to just label the datapoints myself and continuously refine pseudo labels generated by models.

yldedly t1_irvmn2v wrote on October 11, 2022 at 11:46 AM

>The correct answer nobody wants to hear is: "I should have spent a week labelling data"

... with active learning?

ratatouille_artist OP t1_irtgyba wrote on October 10, 2022 at 10:27 PM

I think the devil is in the details. You can use weak supervision to sample from a particular distribution and make your labelling more efficient.

It also works really well in pharma where you can build and apply ontologies for your weak supervision. In this case annotation would still be hard and required but your annotations would also be structured and adapted for later use in the ontology at the cost of slower annotation.

Deep_Airport_NYC t1_irtu95a wrote on October 11, 2022 at 12:11 AM

In which contexts would weak supervision be practically applied? It's my sense that if you are going to the effort of labelling data you may as well label the data properly? I have no experience with weak supervision so looking to learn more.

ratatouille_artist OP t1_irvd29t wrote on October 11, 2022 at 9:46 AM

Yeah but what does label the data properly mean? If your high value samples are very sparse you will use some form of sampling usually for 'proper' labelling. Weak supervision can be a sampling strategy fundamentally.

I have used weak supervision with semi-supervised topic models for sampling where it worked very well.

The other largest impact area is using ontologies to extract ontology entities at scale and looking at the distribution of these entities for the problem you are working on. For example in pharma if you are trying to find a DRUG treats DISEASE relationship you might use an ontology to find all DRUG, DISEASE entities in Pubmed abstracts and pull all of them when they cooccur with the treats verb.

For my current work I apply weak supervision for information extraction for sales transcripts. Hopefully will be able to share some of the impact of this at the end of the quarter!

Ulfgardleo t1_irutjd4 wrote on October 11, 2022 at 5:09 AM

A hugely underappreciated fact is the computational difficulty behind learning with weak labels. E.g., if only coarse/group labels are available, multi-class linear classification becomes immediately np-hard.

gradientrun t1_iruw042 wrote on October 11, 2022 at 5:39 AM

Is this a result from some theory paper ?

Ulfgardleo t1_irv9w00 wrote on October 11, 2022 at 8:58 AM

quite easy to proof.

take a multi-class classification problem. Now, pick one class and assign it label 0, assign all other classes the same coarse label 1 and try to find the maximum margin classifier. This problem is equivalent to finding a convex polytope that separates class 0 from class 1 with maximum margin. This is an NP-hard problem. Logistic regression is not much better, but more difficult to proof.

This is already NP-complete when the coarse label encompasses two classes: https://proceedings.neurips.cc/paper/2018/file/22b1f2e0983160db6f7bb9f62f4dbb39-Paper.pdf

ratatouille_artist OP t1_irvdebw wrote on October 11, 2022 at 9:51 AM

Very interesting perspective around the difficulty of learning weak labels. If I have time would be good to do a longer form write up around how effective skweak is for span extraction with it's hidden markov model approach for span extraction.