he is one of the heads of this new Mamba architecture
@thefourthbrotherkaramazov24510 ай бұрын
And s4, and the ssm paper before that lol
@MrHampelmann123 Жыл бұрын
Amazing talk, and impressive research. Thanks.
@user-lf4tu9fq8j Жыл бұрын
excellent presentation. Thank you
@黃啟恩-y9s10 ай бұрын
Excellent presentation and impressive research, i only wonder why SSMs are recurrently efficient? (video timestamp : 32:27) Suppose k is the token len of input history. The general sequence model takes k square (s.t. transformer) time complexity. On the other hand, SSMs still need to encode all stateful history "recurrently". The S4 paper also aims to deal with this issue (multiply A, k-1 times to create a K bar matrix, it also ends in nearly k square) by diagonalizing the matrix. So, it seems SSMs recurrent aren't "naturally" efficient, but require some linear algebra technique. Any suggestion will be appreciated!!
@Q0YO0Q3 ай бұрын
Excellent presentation, well understandable
@ranwang95059 ай бұрын
impressive presentations. thank you
@salehgholamzadeh33682 жыл бұрын
Thanks for a very nice Presentation. at 44:17 (algorithm1). you mentioned "we've been developing simplifications of the model that allow you to bypass all of this and do things much more simply"? Is it already done by now?
@albertgu41312 жыл бұрын
There were two follow-ups on simpler diagonal state space models: DSS (arxiv.org/abs/2203.14343) and S4D (arxiv.org/abs/2206.11893). The code for these is also available from the main repository
@yuktikaura Жыл бұрын
Excellent presentation
@salehgholamzadeh33682 жыл бұрын
Regarding the speech classification example (53:53): theoretically I am not convinced why should the model works perfectly if is trained at different sampling rate. As we know A_bar and B_bar are calculated based on the delta_t (as well as A and B). So sample rate affect A_bar and B_bar and therefore we are training A_and B_bar specifically for that sample rate. Can you please clarify what I am I missing here? Thank you in advance
@albertgu41312 жыл бұрын
Instead of training Abar and Bbar, the parameters that are trained are A, B, and Delta. At test time on a different sampling rate, Delta can simply be multiplied by the relative change in rate (for the given experiment, Delta would be doubled at test time without retraining any parameters)
@temesgenmehari37492 жыл бұрын
Why do you need to learn the delta? For example, for the ecg example, you already know the sample rate of the data, right?
@p0w3rFloW2 жыл бұрын
Thanks for the amazing talk and work! Maybe it's trivial but I wonder how you actually reconstruct the signal from the hidden state, i.e., how does C look like ? (at 23:50)
@albertgu41312 жыл бұрын
Just as A and B have specific formulas, there is a corresponding formula for C (related to evaluations of Legendre polynomials) that can be used for reconstruction. Notebooks for reproducing plots in this talk are available here in the official repository
@JamesTJoseph Жыл бұрын
Will subspace identification help to initialize A,B,C and D?
@mohdil1232 жыл бұрын
Awesome!
@theskydebreuil11 ай бұрын
Super interesting! Thanks for the presentation. I work in game development for now, but cool to see how things are going in the ML world 😊