Viewing a single comment thread. View all comments

suflaj t1_iycm2mj wrote

Depends on the transformer, but generally yes. Pretraining BERT costs like 10k$ in compute, maybe less now. You can train BiLSTM models from scratch on a single consumer card for a similar task in a day or so.

4