Why Transformers fail at Time Series. Why do simple models beat Transformers at TSF

Рет қаралды 586

Күн бұрын

When they were first gaining attention, the world lost its mind about Transformers in Time Series Forecasting. Unfortunately, Transformers never quite lived up to the hype. So, what went wrong?
To quote the authors of, "TSMixer: An All-MLP Architecture for Time Series Forecasting"- "The natural intuition is that multivariate models, such as those based on Transformer architectures, should be more effective than univariate models due to their ability to leverage cross-variate information. However, Zeng et al. (2023) revealed that this is not always the case - Transformer-based models can indeed be significantly worse than simple univariate temporal linear models on many commonly used forecasting benchmarks. The multivariate models seem to suffer from overfitting especially when the target time series is not correlated with other covariates."
The problems for Transformers don't end here. The authors of 'Are Transformers Effective for Time Series Forecasting' demonstrated that Transformer models could be beaten by a very simple linear model. When analyzing why Transformers failed, they pointed to the Multi-Headed Self Attention as a potential reason for their failure.
"More importantly, the main working power of the Transformer architecture is from its multi-head self-attention mechanism, which has a remarkable capability of extracting semantic correlations between paired elements in a long sequence (e.g., words in texts or 2D patches in images), and this procedure is permutation-invariant, i.e., regardless of the order. However, for time series analysis, we are mainly interested in modeling the temporal dynamics among a continuous set of points, wherein the order itself often plays the most crucial role."
To learn more about their research and Transformers in TSF tasks, I would suggest reading the article below.
Reach out to me
Use the links below to check out my other content, learn more about tutoring, reach out to me about projects, or just to say hi.
Small Snippets about Tech, AI and Machine Learning over here
AI Newsletter- artificialinte...
My grandma’s favorite Tech Newsletter- codingintervie...
Check out my other articles on Medium. : rb.gy/zn1aiu
My KZbin: rb.gy/88iwdd
Reach out to me on LinkedIn. Let’s connect: rb.gy/m5ok2y
My Instagram: rb.gy/gmvuy9
My Twitter: / machine01776819
Are Transformers effective for TSF- artificialinte...
For more details, sign up for my free AI Newsletter, AI Made Simple. AI Made Simple- artificialinte...
If you want to take your career to the next level, Use the discount 20% off for 1 year for my premium tech publication, Tech Made Simple.
Using this discount will drop the prices- 800 INR (10 USD) → 640 INR (8 USD) per Month
8000 INR (100 USD) → 6400INR (80 USD) per year (533 INR /month)
Get 20% off for 1 year- codingintervie...
Catch y'all soon. Stay Woke and Go Kill all
Reach out to me
Use the links below to check out my other content, learn more about tutoring, reach out to me about projects, or just to say hi.
Small Snippets about Tech, AI and Machine Learning over here
AI Newsletter- artificialinte...
My grandma’s favorite Tech Newsletter- codingintervie...
Check out my other articles on Medium. : rb.gy/zn1aiu
My KZbin: rb.gy/88iwdd
Reach out to me on LinkedIn. Let’s connect: rb.gy/m5ok2y
My Instagram: rb.gy/gmvuy9
My Twitter: / machine01776819

Пікірлер: 2

@andreszapata4972 10 ай бұрын

I'm trying to develop a simple transformer architecture neural network to predict a 'line', but it doesn't perform well on data outside of the training set. I'm not sure what to do. Sometimes the network fits the training data well, but it's unable to generalize. I also tried using an LSTM, but the same issue occurs. What can I do? Keep in mind that I want to use the network to train it with different data, so the 'line' is my starting point

@LightMouradYagami Ай бұрын

i guess transformer based architecture for such a task is an overkill. i guess you either have to use small hyperparameters (for example d_model=2, dff=4, etc) or you must have a strong regularizations