MedAI #41: Efficiently Modeling Long Sequences with Structured State Spaces

MedAI #41: Efficiently Modeling Long Sequences with Structured State Spaces | Albert Gu

Рет қаралды 32,269

Stanford MedAI

Күн бұрын

Пікірлер: 22

@EobardUchihaThawne 10 ай бұрын

he is one of the heads of this new Mamba architecture

@thefourthbrotherkaramazov245 10 ай бұрын

And s4, and the ssm paper before that lol

@MrHampelmann123 Жыл бұрын

Amazing talk, and impressive research. Thanks.

@user-lf4tu9fq8j Жыл бұрын

excellent presentation. Thank you

@黃啟恩-y9s 10 ай бұрын

Excellent presentation and impressive research, i only wonder why SSMs are recurrently efficient? (video timestamp : 32:27) Suppose k is the token len of input history. The general sequence model takes k square (s.t. transformer) time complexity. On the other hand, SSMs still need to encode all stateful history "recurrently". The S4 paper also aims to deal with this issue (multiply A, k-1 times to create a K bar matrix, it also ends in nearly k square) by diagonalizing the matrix. So, it seems SSMs recurrent aren't "naturally" efficient, but require some linear algebra technique. Any suggestion will be appreciated!!

@Q0YO0Q 3 ай бұрын

Excellent presentation, well understandable

@ranwang9505 9 ай бұрын

impressive presentations. thank you

@salehgholamzadeh3368 2 жыл бұрын

Thanks for a very nice Presentation. at 44:17 (algorithm1). you mentioned "we've been developing simplifications of the model that allow you to bypass all of this and do things much more simply"? Is it already done by now?

@albertgu4131 2 жыл бұрын

There were two follow-ups on simpler diagonal state space models: DSS (arxiv.org/abs/2203.14343) and S4D (arxiv.org/abs/2206.11893). The code for these is also available from the main repository

@yuktikaura Жыл бұрын

Excellent presentation

@salehgholamzadeh3368 2 жыл бұрын

Regarding the speech classification example (53:53): theoretically I am not convinced why should the model works perfectly if is trained at different sampling rate. As we know A_bar and B_bar are calculated based on the delta_t (as well as A and B). So sample rate affect A_bar and B_bar and therefore we are training A_and B_bar specifically for that sample rate. Can you please clarify what I am I missing here? Thank you in advance

@albertgu4131 2 жыл бұрын

Instead of training Abar and Bbar, the parameters that are trained are A, B, and Delta. At test time on a different sampling rate, Delta can simply be multiplied by the relative change in rate (for the given experiment, Delta would be doubled at test time without retraining any parameters)

@temesgenmehari3749 2 жыл бұрын

Why do you need to learn the delta? For example, for the ecg example, you already know the sample rate of the data, right?

@p0w3rFloW 2 жыл бұрын

Thanks for the amazing talk and work! Maybe it's trivial but I wonder how you actually reconstruct the signal from the hidden state, i.e., how does C look like ? (at 23:50)

@albertgu4131 2 жыл бұрын

Just as A and B have specific formulas, there is a corresponding formula for C (related to evaluations of Legendre polynomials) that can be used for reconstruction. Notebooks for reproducing plots in this talk are available here in the official repository