Submitted by ichiichisan t3_ys974h in MachineLearning

I am trying to research well working methods for regulraization in small-data NLP finetuning scenarios, specifically for regression.

Coming from computer vision background, it appears to me that no established method has emerged that works well across tasks and it is really hard to combat stark overfitting on small data tasks.

I am specifically looking for methods that are special to NLP finetuning and go beyond classical DL regularization techniques like dropout or weight decay.

Happy for any pointers!

6

Comments

You must log in or register to comment.

mediocregradstudent t1_ivyynyr wrote

Recent work has shown generating a paraphrase of the original sentence could be used to improve robustness for sentence-level NLP tasks. What specific task are you working on in the low data setting?

2

ichiichisan OP t1_ivza5fi wrote

In my current specific task it is simple multilabel regression, but I am also regularly working on other multilabel/multiclass classification and regression tasks.

And with low data sample I refer to ranges of 1k+ samples still, but I mostly work on longer text, not short sentences.

0

spurious_waffles t1_ivzflun wrote

You could try very small character level perturbations of your input such as deletions, repetitions, and character swaps. You just need to be careful to not change the semantic meaning of your input text.

There's some research our there showing that BERT-like models break down on standard benchmarks when the benchmark text contains a small amount of character level noise.

−1

ichiichisan OP t1_ivznfay wrote

Thanks, but I am not looking for suggestions, but rather for something that has been proven to work, in best case with research on it.

It is quite common knowledge that any random altering of input text does not help with finetuning NLP tasks.

1

Nameless1995 t1_iw05d36 wrote

There isn't an established standard AFAIK.

EDA is a simple baseline for augmentation: https://arxiv.org/abs/1901.11196

(see citations in google scholar for recent ones).

(Recent ones are playing around with counterfactural augmentation and such but not sure if any standard stable technology has arisen.)

This one had nice low resource performance: https://arxiv.org/pdf/2106.05469.pdf

Also this: https://aclanthology.org/2021.emnlp-main.749.pdf (you can find some new stuff from citations in google scholar/semantic scholar).

I think Prompt Tuning, Contrastive Learning (https://openreview.net/pdf?id=cu7IUiOhujH) did show better very low resource performance too, but, the benefit tapers out as you increase data.

If you are seeking for Adversarial robustness there are also other techniques for that. I think FreeLB was popular a while ago. There's also SAM for flatter minima.

5