Viewing a single comment thread. View all comments

KBM_KBM t1_j4j10y6 wrote

You can pre train and finetune energy efficient language models such as electra or convbert in this gpu. But maybe the batch size might not be too big so the descent would be a bit noisy and also keep the corpus size as small as possible.

Look into bio electra paper which also has the notebook on how he has trained it .

2