MedAI #41: Efficiently Modeling Long Sequences with Structured State Spaces | Albert Gu

  Рет қаралды 32,269

Stanford MedAI

Stanford MedAI

Күн бұрын

Пікірлер: 22
@EobardUchihaThawne
@EobardUchihaThawne 10 ай бұрын
he is one of the heads of this new Mamba architecture
@thefourthbrotherkaramazov245
@thefourthbrotherkaramazov245 10 ай бұрын
And s4, and the ssm paper before that lol
@MrHampelmann123
@MrHampelmann123 Жыл бұрын
Amazing talk, and impressive research. Thanks.
@user-lf4tu9fq8j
@user-lf4tu9fq8j Жыл бұрын
excellent presentation. Thank you
@黃啟恩-y9s
@黃啟恩-y9s 10 ай бұрын
Excellent presentation and impressive research, i only wonder why SSMs are recurrently efficient? (video timestamp : 32:27) Suppose k is the token len of input history. The general sequence model takes k square (s.t. transformer) time complexity. On the other hand, SSMs still need to encode all stateful history "recurrently". The S4 paper also aims to deal with this issue (multiply A, k-1 times to create a K bar matrix, it also ends in nearly k square) by diagonalizing the matrix. So, it seems SSMs recurrent aren't "naturally" efficient, but require some linear algebra technique. Any suggestion will be appreciated!!
@Q0YO0Q
@Q0YO0Q 3 ай бұрын
Excellent presentation, well understandable
@ranwang9505
@ranwang9505 9 ай бұрын
impressive presentations. thank you
@salehgholamzadeh3368
@salehgholamzadeh3368 2 жыл бұрын
Thanks for a very nice Presentation. at 44:17 (algorithm1). you mentioned "we've been developing simplifications of the model that allow you to bypass all of this and do things much more simply"? Is it already done by now?
@albertgu4131
@albertgu4131 2 жыл бұрын
There were two follow-ups on simpler diagonal state space models: DSS (arxiv.org/abs/2203.14343) and S4D (arxiv.org/abs/2206.11893). The code for these is also available from the main repository
@yuktikaura
@yuktikaura Жыл бұрын
Excellent presentation
@salehgholamzadeh3368
@salehgholamzadeh3368 2 жыл бұрын
Regarding the speech classification example (53:53): theoretically I am not convinced why should the model works perfectly if is trained at different sampling rate. As we know A_bar and B_bar are calculated based on the delta_t (as well as A and B). So sample rate affect A_bar and B_bar and therefore we are training A_and B_bar specifically for that sample rate. Can you please clarify what I am I missing here? Thank you in advance
@albertgu4131
@albertgu4131 2 жыл бұрын
Instead of training Abar and Bbar, the parameters that are trained are A, B, and Delta. At test time on a different sampling rate, Delta can simply be multiplied by the relative change in rate (for the given experiment, Delta would be doubled at test time without retraining any parameters)
@temesgenmehari3749
@temesgenmehari3749 2 жыл бұрын
Why do you need to learn the delta? For example, for the ecg example, you already know the sample rate of the data, right?
@p0w3rFloW
@p0w3rFloW 2 жыл бұрын
Thanks for the amazing talk and work! Maybe it's trivial but I wonder how you actually reconstruct the signal from the hidden state, i.e., how does C look like ? (at 23:50)
@albertgu4131
@albertgu4131 2 жыл бұрын
Just as A and B have specific formulas, there is a corresponding formula for C (related to evaluations of Legendre polynomials) that can be used for reconstruction. Notebooks for reproducing plots in this talk are available here in the official repository
@JamesTJoseph
@JamesTJoseph Жыл бұрын
Will subspace identification help to initialize A,B,C and D?
@mohdil123
@mohdil123 2 жыл бұрын
Awesome!
@theskydebreuil
@theskydebreuil 11 ай бұрын
Super interesting! Thanks for the presentation. I work in game development for now, but cool to see how things are going in the ML world 😊
@YUNBOWANG-tx4ju
@YUNBOWANG-tx4ju 7 ай бұрын
so good
@JoeClare-x3s
@JoeClare-x3s 3 ай бұрын
084 Veum Drive
@马辉-r5l
@马辉-r5l 6 ай бұрын
希望有中文字幕,英文听力不好。
@PeinQein
@PeinQein 4 ай бұрын
不是有自动翻译嘛
Mamba, Mamba-2 and Post-Transformer Architectures for Generative AI with Albert Gu - 693
57:25
The TWIML AI Podcast with Sam Charrington
Рет қаралды 6 М.
Мен атып көрмегенмін ! | Qalam | 5 серия
25:41
Сестра обхитрила!
00:17
Victoria Portfolio
Рет қаралды 958 М.
小丑女COCO的审判。#天使 #小丑 #超人不会飞
00:53
超人不会飞
Рет қаралды 16 МЛН
The State Space Model Revolution, with Albert Gu
1:42:16
Cognitive Revolution "How AI Changes Everything"
Рет қаралды 2,4 М.
The math behind Attention: Keys, Queries, and Values matrices
36:16
Serrano.Academy
Рет қаралды 269 М.
The Most Important Algorithm in Machine Learning
40:08
Artem Kirsanov
Рет қаралды 546 М.
[1hr Talk] Intro to Large Language Models
59:48
Andrej Karpathy
Рет қаралды 2,3 МЛН