Viewing a single comment thread. View all comments

gamerx88 t1_j3m0drc wrote

Yes, we used DistilBERT (and even logistic regression) heavily in my previous startup where data volume was web scale.

Depending on the exact problem, large transformer models can be an overkill. For some straightforward text classification even logistic regression with some feature engineering can hit within 3% point of a transformer, and costs a negligible fraction of them.

3