XLSTM - Extended LSTMs with sLSTM and mLSTM (paper explained)

Рет қаралды 4,937

Күн бұрын

Пікірлер: 13

@lenhobach3162 6 ай бұрын

I really like the way you explain the paper. A lot of concepts I confused have been touched, but I wish the block parts been explained more detailed, like why the modules are used like that in those blocks. Anw thank you so much for the video, +1 subscriber, and hope to see more from you in the future.

@AIBites 3 ай бұрын

sure. So are you more interested in papers and theory? Or would you like more on hands-on LLMs, RAG, etc. Just trying to understand the audience better. :)

@ariisaac5111 2 ай бұрын

@@AIBites I'm more interested in the research papers and theories and any insightful implications that you can contribute along the way. What you did here is a nice Baseline. thx!

@thesimplicitylifestyle 6 ай бұрын

It's so much fun looking under the hood. Thanks for explaining it so well! 😎🤖

@AIBites 3 ай бұрын

my pleasure :)

@yuanyuan4985 5 ай бұрын

Thank you so much for providing this video!!!!!

@AIBites 3 ай бұрын

my pleasure Yuan! 🙂

@newbie8051 3 ай бұрын

Well the graphs at 2:18 are incorrect, sigmoid and tanh have different ranges, so the output gate should have range - 1 to 1 (tanh)

@AIBites 3 ай бұрын

thats a great spot. Copy pasting oversight I guess 🙂 will pay more attention while making the videos on attention. Thank you 😀

@newbie8051 6 ай бұрын

Could only grasp the sLSTM on the first read So the exponential activation pushes up everything So we use log to get every activation in a smaller range ? damn, pretty interesting

@AIBites 3 ай бұрын

thank you. Yes, whenever I don't understand equations, I plug in numbers to push values to the extremes. This way, it paints a better picture to understand! :)