Viewing a single comment thread. View all comments

spurious_waffles t1_ivzflun wrote

You could try very small character level perturbations of your input such as deletions, repetitions, and character swaps. You just need to be careful to not change the semantic meaning of your input text.

There's some research our there showing that BERT-like models break down on standard benchmarks when the benchmark text contains a small amount of character level noise.

−1

ichiichisan OP t1_ivznfay wrote

Thanks, but I am not looking for suggestions, but rather for something that has been proven to work, in best case with research on it.

It is quite common knowledge that any random altering of input text does not help with finetuning NLP tasks.

1

spurious_waffles t1_ivztdom wrote

There is a ton of research on denoising objectives in NLP.

Best of luck!

−1