xLSTM Explained in Detail!!!

  Рет қаралды 5,184

1littlecoder

1littlecoder

Күн бұрын

Пікірлер: 24
@ilianos
@ilianos 15 күн бұрын
🎯 Key points for quick navigation: 00:00 *📝 Introduction to xLSTM and Max Beck* - Introduction to Max Beck and xLSTM paper, - xLSTM as an alternative to Transformers, overview of the discussion structure. 00:41 *🔍 Historical Context of LSTM and Transformers* - Review of LSTM's performance before 2017, - Introduction of Transformers in 2017 and their advantages, - Developments in language models like GPT-2 and GPT-3. 03:08 *🚀 Limitations of Transformers* - Drawbacks of self-attention mechanism in Transformers, - Issues with scaling in sequence lengths and GPU memory requirements, - Efforts to create more efficient architectures. 04:15 *⚙️ Revisiting LSTM with Modern Techniques* - Combining old LSTM ideas with modern techniques, - Overview of original LSTM's memory cell updates and gate functions, - Introduction to the limitations of LSTMs in tasks like nearest neighbor search. 07:39 *📈 Overcoming LSTM Limitations* - Demonstrating how xLSTM overcomes storage decision revisions, - Introduction of exponential gating to improve LSTM performance, - Comparison of LSTM, xLSTM, and Transformer in specific tasks. 11:00 *🧠 Enhancing Memory Capacity and Efficiency* - Addressing LSTM’s limited storage capacity and parallelization issues, - Introduction of large Matrix memory in xLSTM, - Methods to enhance training efficiency through new variants. 12:25 *🔑 Core of xLSTM: Exponential Gating* - Detailed explanation of exponential gating mechanism, - Introduction of new memory cell states and stabilization techniques, - Comparison with original LSTM gating mechanisms. 16:00 *🧮 New xLSTM Variants: SLSM and MLSM* - Description of SLSM with scalar cell states and new memory mixing, - Introduction of MLSM with matrix memory cell state and covariance update rule, - Differences between the two variants in terms of memory mechanisms and parallel training. 20:57 *🔍 Performance Comparison and Evaluation* - Evaluation of xLSTM on language experiments, - Comparison with other models on different datasets and parameter sizes, - Demonstration of xLSTM’s superior performance in length extrapolation and perplexity metrics. 25:59 *📊 Scaling xLSTM and Future Plans* - Scaling xLSTM models shows favorable performance, - Plans to build larger models (7 billion parameters) and write efficient kernels, - Potential applications and further exploration of xLSTM capabilities. 27:37 *🤔 Motivation for LSTM over Transformers* - Explanation of inefficiencies in Transformer models for text generation, - Benefits of LSTM's fixed state size for more efficient generation on edge devices, - Encouragement to explore recurrent network alternatives over Transformers. 29:05 *🎓 Research Directions and Advice* - Discussion on the potential for recurrent network alternatives in language modeling, - Advice for aspiring researchers to focus on making language models more efficient, - Mention of Yan LeCun’s advice to explore beyond Transformers. 29:59 *🏢 Industry Adoption and Future Trends* - Observations on the adoption of models like Mamba in industry, - Expectations for similar trends with xLSTM, - Mention of a company working on scaling xLSTM for practical applications. 30:50 *🌐 Convincing the Industry to Switch from Transformers* - Challenges in shifting industry focus from Transformers to alternative architectures, - Need to demonstrate xLSTM’s efficiency and performance to gain industry acceptance, - Importance of open-sourcing efficient kernels to facilitate adoption. Made with HARPA AI
@optimalpkg
@optimalpkg 15 күн бұрын
People are not using transformer because its just very good for LLMs, but majorly the concept has been a big leap forward in field of ML. I liked the new xLSTM paper and also Mamba when it came out, but I think Transformers have been very revolutionary also because of how it aligns well on the hardware level. Karpathy had a nice discussion on it once at Stanford sometime along with Dzmitry (introduced attention mechanism in 2014), though this was before Mamba was fully open-sourced.
@fontende
@fontende 14 күн бұрын
with new Asic accelerator cards you will be locked to use only transformers, but there's no choice, gpus for universal use are much more expensive & slower
@optimalpkg
@optimalpkg 14 күн бұрын
​@@fontende Traditional gpu's are best for training because of the internal architecture, gate logic and how the gradients and weights need to propagate across layers. Recently "etched" lauched an Asic gpu "Sohu" having the transformer logic built into the hardware itself. This is golden for inference till the point something new replaces transformers. The point I was getting to is that transformers although introduced initially as a good idea for language translation, turned out to be such a novel architecture that it became the base/best solution of multitude of ML problems (with variations of course). Therefore, it makes sense to make ASICs for transformer given the current use-case and popularity of it (it's a money grab gold mine atm). I would love to see a new architecture which is novel and cab be the base for multiple field of ML like transformers and I like that the community is super active in trying to get there. Its exciting to see these new papers (including xLSTM, Mamba/state-spaces etc.).
@klauszinser
@klauszinser 3 күн бұрын
It would be interesting to take a small transformer model and build an xLSTM with the same HW-environment to compare how they (Transformer xLSTM) behave in comparison?
@volkerlorrmann1713
@volkerlorrmann1713 10 күн бұрын
Wow Max 🔥
@knutjagersberg381
@knutjagersberg381 13 күн бұрын
Great pokemon catch!
@1littlecoder
@1littlecoder 13 күн бұрын
Thank you, I was glad to get Max's time!
@test2109-wk1zq
@test2109-wk1zq 15 күн бұрын
why the re-up?
@1littlecoder
@1littlecoder 15 күн бұрын
Tried something!
@Kutsushita_yukino
@Kutsushita_yukino 15 күн бұрын
thanks my left hear is satisfied
@1littlecoder
@1littlecoder 15 күн бұрын
does it not have sound both the ears?
@KevinKreger
@KevinKreger 15 күн бұрын
@@1littlecoder just Max, not you.
@MaJetiGizzle
@MaJetiGizzle 15 күн бұрын
@@1littlecoderIt’s only Max talking in my left ear when wearing headphones.
@1littlecoder
@1littlecoder 15 күн бұрын
I'm so sorry, it's my mistake, probably! I didn't hear with both the sides of headphones, otherwise could've avoided this!
@PankajDoharey
@PankajDoharey 15 күн бұрын
@@1littlecoder Can you reupload with stereo audio?
@Macorelppa
@Macorelppa 15 күн бұрын
😋
@1littlecoder
@1littlecoder 15 күн бұрын
🙏🏾
@PankajDoharey
@PankajDoharey 15 күн бұрын
Mono Audio.
@1littlecoder
@1littlecoder 15 күн бұрын
Yeah my bad. I didn't listen with headphone, from computer speakers, i couldn't realize it was mono
I wish every AI Engineer could watch this.
33:49
1littlecoder
Рет қаралды 67 М.
The moment we stopped understanding AI [AlexNet]
17:38
Welch Labs
Рет қаралды 650 М.
Русалка
01:00
История одного вокалиста
Рет қаралды 6 МЛН
Looks realistic #tiktok
00:22
Анастасия Тарасова
Рет қаралды 100 МЛН
1 or 2?🐄
00:12
Kan Andrey
Рет қаралды 57 МЛН
Who has won ?? 😀 #shortvideo #lizzyisaeva
00:24
Lizzy Isaeva
Рет қаралды 62 МЛН
The Most Important Algorithm in Machine Learning
40:08
Artem Kirsanov
Рет қаралды 330 М.
From a Github PR to LlamaIndex Engineer & AI Researcher!!!
31:39
1littlecoder
Рет қаралды 2,2 М.
Has Generative AI Already Peaked? - Computerphile
12:48
Computerphile
Рет қаралды 870 М.
Generative AI in a Nutshell - how to survive and thrive in the age of AI
17:57
Mistral AI goes MAMBA!!!
8:02
1littlecoder
Рет қаралды 4,1 М.
GraphRAG: LLM-Derived Knowledge Graphs for RAG
15:40
Alex Chao
Рет қаралды 92 М.
Rotary Positional Embeddings: Combining Absolute and Relative
11:17
Efficient NLP
Рет қаралды 27 М.
Battery  low 🔋 🪫
0:10
dednahype
Рет қаралды 5 МЛН
Clicks чехол-клавиатура для iPhone ⌨️
0:59
Игровой Комп с Авито за 4500р
1:00
ЖЕЛЕЗНЫЙ КОРОЛЬ
Рет қаралды 2,2 МЛН