Viewing a single comment thread. View all comments

suflaj t1_j2w69wh wrote

I would ask myself why one would consider transformers useful for any task. They seem to transfer knowledge really well. If that is the only thing that makes them viable for a given task, ex. time series forecasting, then it becomes obvious how simpler models can outperform.

But then the question becomes - are transformers the easiest models to transfer knowledge on for a given task? For time series forecasting, I do believe that is the case. For ex. CV, I am still not convinced.

If you're then bothered by their overhead, distill them to a simpler model. I don't think there's a better alternative architecture family for finetuning on tasks. Remember that transformers do not necessarily need to appear in the final product, but they can be a really good intermediate proxy for getting to that final product.

5