Submitted by ratatouille_artist t3_y0qra7 in MachineLearning

I had the pleasure of running a workshop on weak supervision for NLP recently. I would like to hear more about what are your experiences with using weak supervision for NLP?

I am a huge of weak supervision personally, I think skweak is a great tool for span based weak supervision.

With simple and efficient out-of-the-box machine learning APIs finetuning and deploying machine learning models has never been easier. The lack of labelled data is a real bottleneck for most projects. Weak supervision can help:

  • labelling data more efficiently
  • generating noisy labelled data to finetune your model on

Benefits of weak supervision

Here's an example skweak labelling function to generate noisy labelled data:

from skweak.base import SpanAggregator

class MoneyDetector(SpanAggregator):
    def __init__(self):
        super(MoneyDetector, self).__init__("money_detector")

    def find_spans(self, doc):
        for tok in doc[1:]:
            if tok.text[0].isdigit() and tok.nbor(-1).is_currency:
                yield tok.i-1, tok.i+1, "MONEY"

money_detector = MoneyDetector()

This labelling function extracts any digits that are preceded by a currency.

Example of labelling function in action

​

skweak allows you to combine multiple labelling functions using spacy attributes or other methods.

Using labelling functions has a number of advantages:

  1. 💪 larger coverage, a single labelling function can cover many samples
  2. 🤓 involving experts, domain expert annotation is expensive, domain expert labelling functions are more economical due to coverage
  3. 🌬️ adopting to changing domains, labelling functions and data assets can be adapted to changing domains

What are your experiences with weak supervision in NLP? I really recommend trying out skweak in particular if you work with span extraction.

0

Comments

You must log in or register to comment.

Ulfgardleo t1_irtdtoj wrote

This feels and sounds like an add. But i could not find out for what. maybe you should make it clear which product i should definitely use.

12

Empty-Painter-3868 t1_irtdww2 wrote

Great question. In practice, I spend a week crafting a 'good' weak dataset. The result is a modest performance gain, and the model becomes a lot more unpredictable (spans off by a token or so).

The correct answer nobody wants to hear is: "I should have spent a week labelling data"

Forget Snorkel and all that crap. It's harder to make good labelling functions than it is to label data, IMO

7

ratatouille_artist OP t1_irtg348 wrote

Point taken about the advert style writing, thanks for the feedback. My goal with the post is seeing what others do for weak supervision for NLP. I also think it's an underappreciated topic and would like to see more discussions around it.

−1

ratatouille_artist OP t1_irtgyba wrote

I think the devil is in the details. You can use weak supervision to sample from a particular distribution and make your labelling more efficient.

It also works really well in pharma where you can build and apply ontologies for your weak supervision. In this case annotation would still be hard and required but your annotations would also be structured and adapted for later use in the ontology at the cost of slower annotation.

0

Deep_Airport_NYC t1_irtu95a wrote

In which contexts would weak supervision be practically applied? It's my sense that if you are going to the effort of labelling data you may as well label the data properly? I have no experience with weak supervision so looking to learn more.

3

Ulfgardleo t1_irutjd4 wrote

A hugely underappreciated fact is the computational difficulty behind learning with weak labels. E.g., if only coarse/group labels are available, multi-class linear classification becomes immediately np-hard.

3

Ulfgardleo t1_irv9w00 wrote

quite easy to proof.

take a multi-class classification problem. Now, pick one class and assign it label 0, assign all other classes the same coarse label 1 and try to find the maximum margin classifier. This problem is equivalent to finding a convex polytope that separates class 0 from class 1 with maximum margin. This is an NP-hard problem. Logistic regression is not much better, but more difficult to proof.

This is already NP-complete when the coarse label encompasses two classes: https://proceedings.neurips.cc/paper/2018/file/22b1f2e0983160db6f7bb9f62f4dbb39-Paper.pdf

3

ratatouille_artist OP t1_irvd29t wrote

Yeah but what does label the data properly mean? If your high value samples are very sparse you will use some form of sampling usually for 'proper' labelling. Weak supervision can be a sampling strategy fundamentally.

I have used weak supervision with semi-supervised topic models for sampling where it worked very well.

The other largest impact area is using ontologies to extract ontology entities at scale and looking at the distribution of these entities for the problem you are working on. For example in pharma if you are trying to find a DRUG treats DISEASE relationship you might use an ontology to find all DRUG, DISEASE entities in Pubmed abstracts and pull all of them when they cooccur with the treats verb.

For my current work I apply weak supervision for information extraction for sales transcripts. Hopefully will be able to share some of the impact of this at the end of the quarter!

2

ratatouille_artist OP t1_irvdebw wrote

Very interesting perspective around the difficulty of learning weak labels. If I have time would be good to do a longer form write up around how effective skweak is for span extraction with it's hidden markov model approach for span extraction.

0