New xLSTM explained: Better than Transformer LLMs?

  Рет қаралды 4,927

code_your_own_AI

code_your_own_AI

Ай бұрын

JUST days ago a new alternative to transformer LLMs was published: xLSTM, in particular mLSTM. The Matrix Long Short-Term Memory (mLSTM) network is an advanced variation of the traditional Long Short-Term Memory (LSTM) model. The core idea of mLSTM is based on "accumulated covariance" with exponential gating functions. I explain it in detail in this video and compare it to the classical attention mechanism.
The actual performance can't be independently evaluated at the moment, since the research paper was just published. I will keep you informed.
mLSTM differentiates itself by employing a matrix-based approach to its architecture, where both the input and recurrent weights along with the gates (input, forget, and output gates) are represented as matrices rather than the standard vectors. This configuration allows the mLSTM to process inputs and maintain internal states using matrix operations, facilitating a more intricate interaction between inputs and the recurrent network's hidden states.
One of the most significant innovations of mLSTM is its ability to capture and represent more complex relationships and dependencies within the data. By utilizing matrices to represent its states and operations, mLSTM can encapsulate relationships across multiple dimensions of the input data simultaneously, increasing the network's representational power and computational efficiency, especially for tasks involving high-dimensional data sets such as natural language processing and time series analysis involving multiple variables. This matrix approach not only enhances the depth of data interaction within each cell of the network but also allows the network to model interactions across different features within the data
All rights w/ authors:
xLSTM: Extended Long Short-Term Memory
arxiv.org/pdf/2405.04517
#airesearch
#ai
#newtechnology

Пікірлер: 13
@first-thoughtgiver-of-will2456
@first-thoughtgiver-of-will2456 6 күн бұрын
this just makes me want to innovate off mamba
@propeacemindfortress
@propeacemindfortress Ай бұрын
nice, my favorite timeseries staple get's an upgrade 😄 awesome find, and big big thanks for sharing
@wiktorm9858
@wiktorm9858 Ай бұрын
Is there a ready-made pytorch implementation of this?
@davidhauser7537
@davidhauser7537 21 күн бұрын
very cool
@denishclarke4470
@denishclarke4470 18 сағат бұрын
Hey, please provide the slides
@timothywcrane
@timothywcrane Ай бұрын
I hope this resets the audio industry as well. LSTM are great for melody prediction etc... I wonder how this new modeling will be applicable and expandable in scope.
@Dom-zy1qy
@Dom-zy1qy Ай бұрын
I haven't had much luck creating a good model to predict melodies. Any resources you recommend?
@timothywcrane
@timothywcrane Ай бұрын
@@Dom-zy1qy check out @ValerioVelardoTheSoundofAI
@thedoctor5478
@thedoctor5478 Ай бұрын
woh woh. did you forgot to say a little something at beginning of video?
@thomasmitchell2514
@thomasmitchell2514 Ай бұрын
Hahaha my wife rolls her eyes when I say it along with him after gleefully clicking on a new upload 😅 Also I can’t help echoing “beautiful” out loud even with headphones on 😂
@JonathanYankovich
@JonathanYankovich Ай бұрын
He said it :)
@user-wd8wx5md5z
@user-wd8wx5md5z Ай бұрын
​ @thomasmitchell2514 What are you all talking about ? What is the funny part ? all I see is machine learning stuff ...
@SergiiNechuiviter
@SergiiNechuiviter 7 күн бұрын
Overcomplicated explanation. Too many formal definitions, which relay don't add to comprehensibility .
Understand DSPy: Programming AI Pipelines
28:21
code_your_own_AI
Рет қаралды 3,3 М.
[NXAI Lab] xLSTM: Extended Long Short-Term Memory
11:36
Trend in Research
Рет қаралды 124
1❤️#thankyou #shorts
00:21
あみか部
Рет қаралды 69 МЛН
СНЕЖКИ ЛЕТОМ?? #shorts
00:30
Паша Осадчий
Рет қаралды 4,2 МЛН
Why next-gen chips separate Data & Power
18:56
High Yield
Рет қаралды 147 М.
GraphRAG: LLM-Derived Knowledge Graphs for RAG
15:40
Alex Chao
Рет қаралды 70 М.
AI’s Hardware Problem
16:47
Asianometry
Рет қаралды 616 М.
Mem VPN - в Apple Store
0:30
AndroHack
Рет қаралды 67 М.
Интереснее чем Apple Store - шоурум BigGeek
0:42
МОЩНЕЕ ТВОЕГО ПК - iPad Pro M4 (feat. Brickspacer)
28:01
ЗЕ МАККЕРС
Рет қаралды 60 М.
How To Unlock Your iphone With Your Voice
0:34
요루퐁 yorupong
Рет қаралды 20 МЛН