Submitted by AttentionImaginary54 t3_102mf6v in MachineLearning

I recently came across this paper Are Transformers Effective for Time Series Forecasting? and it seems to cast doubt on the recent trend of using transformers for time series forecasting, suggesting a simple model can out perform complex transformers.

Personally, in many of my experiments using transformers on temporal data besides the quite commonly tested benchmarks (ETH, exchange, etc) they perform poorly compared to other simple(r) models like GRUs or DA-RNN. Yet we are still seeing an explosion of papers about them in the research community. Are there other recent deep learning based alternatives?

32

Comments

You must log in or register to comment.

suflaj t1_j2w69wh wrote

I would ask myself why one would consider transformers useful for any task. They seem to transfer knowledge really well. If that is the only thing that makes them viable for a given task, ex. time series forecasting, then it becomes obvious how simpler models can outperform.

But then the question becomes - are transformers the easiest models to transfer knowledge on for a given task? For time series forecasting, I do believe that is the case. For ex. CV, I am still not convinced.

If you're then bothered by their overhead, distill them to a simpler model. I don't think there's a better alternative architecture family for finetuning on tasks. Remember that transformers do not necessarily need to appear in the final product, but they can be a really good intermediate proxy for getting to that final product.

5

Dc_May t1_j2xgihh wrote

I did some reasearch a few years ago into timeseries forecasting, specifically day-ahead forecasting of photovoltaics from historical data of frequency x (15min) and general weatherforecasts (freq 1h) and we did notice that attention made our LSTM S2S model jump past the (then) state of the art. We published a paper and then I started looking into transformers instead of the LSTM based S2S model and they did perform better albeit this never made it to a paper due to other circumstances.

I think now with the better understanding of transformers we have I would excpect the results to be even clearer, assuming sufficient data and the right setup.

think is a lot of forecasting tasks have low datamass and the feature distilling nature of a transformer might not be the best choice. and then tranining transformers is still a little tricky for any non-vanilla application. Floating point rgression is somewhat different than a multi-label type output after all

3

marcus_hk t1_j2xqe50 wrote

>Are there other recent deep learning based alternatives?

Structured State Space Models

Transformers seem best suited to forming associations among discrete elements. That's what self-attention is, after all. Where transformers perform well over very long ranges (in audio generation for example) there is typically heavy use of Fourier transforms and CNNs as "feature extractors", and the transformer does not process raw data directly.

The S4 model linked above treats time-series data, not as discrete samples, but as continuous signal. Consequently it works much better.

2

SwiftLynx t1_j2uqvgi wrote

Neural ODE might be what you’re looking for

1

diegocgfr t1_j31gfg1 wrote

Incredibly slick job!

1