Viewing a single comment thread. View all comments

OutrageousSundae8270 t1_iyc9bnw wrote

Transformers do generally need to be pre-trained on a large corpus to do well on further downstream tasks.

1