xLSTM: The Sequel To The Legendary LSTM

Рет қаралды 53,200

Күн бұрын

Пікірлер: 89

@bycloudAI 7 ай бұрын

Join the waitlist now for exclusive access to the OnDemand platform: on-demand.io/contact let me know if you like these kinds of research breakdown too! my newsletter: mail.bycloud.ai/

@khla.mp4 7 ай бұрын

I'm glad we have someone to translate research language into memes and references for us

@animalnt 7 ай бұрын

Just need the soundboard and temple run in the corner and we skibidi

@Wagner-uv6yp 7 ай бұрын

@@animalnt Yeah like what is this guy's background lol he sounds like a college kid not a PhD person.

@animalnt 7 ай бұрын

@@Wagner-uv6yp "not a PhD person" 💀

@thebrownfrog 7 ай бұрын

Good thing I'm 15. Perfect channel for me

@revimfadli4666 7 ай бұрын

Other than bycloud and fireship (and maybe 2 minute papers and yannic kilcher), is there another?

@SmallLanguageModel 7 ай бұрын

I heard people in the RWKV and EleutherAI discord complain that they used wrong hyperparameters for some other architectures, while they used the most optimal hyperparameters for xLSTM. So the results are not entirely honest and they try to hype up their own architecture, but what else is new...

@bycloudAI 7 ай бұрын

I was a tiny bit suspicious when xLSTM didn't publish codes, but damn okay

@w花b 7 ай бұрын

When money ruins research... But hey, it's been a thing since research exists.

@adamrak7560 7 ай бұрын

The context extension comparison is practically wrong too, and highly misleading. If you train transformer for a small context length without any extra extension tricks it will perform really badly for large context, by design. Comparing to that is simply not useful. It is simply misleading for most readers/viewers. There are context extension tricks you can use which avoid perplexity explosion for the transformers, those would chance the graph very significantly.

@nnnik3595 7 ай бұрын

Also when you actually look at the graph at 3:54 in detail you can see how the presented combination of both [1, 1] is worse than either of the previously existing methods in almost all metrics.

@DrW1ne 7 ай бұрын

Every video of yours is like my own Christmas! Keep it up.

@user-cg7gd5pw5b 7 ай бұрын

So many memes per second, my brain can't process everything.

@benjamineidam 7 ай бұрын

(> complex topic = > memes / s) = AWESOME!

@bobspianosbffl 7 ай бұрын

This video was so good what the hell Thank you for delivering me this complex info in an easily consumable way Subscribed

@ij9375 7 ай бұрын

I am a simple man, I see King Baldwin, I read AI, I click thumbnail😂

@sanderbos4243 7 ай бұрын

Kingdom of Heaven, my love

@whoami6107 7 ай бұрын

Really like the way you make these videos, i couldn't have understood these things otherwise

@Meleeman011 7 ай бұрын

i have first hand exp with lstms and they are amazing. i can wait to try or implement this architect a solution with it

@literailly 7 ай бұрын

Editing is amazing lol. Nice job, bycloud

@KW-jj9uy 7 ай бұрын

I wonder if this is good for real time robotics, where you get fast real time data in tiny chunks, and you need a fast model with memory

@OnDemandAI 7 ай бұрын

This OnDemand looks like quite the hoot!

@Lokmanne Ай бұрын

Impressive, xLSTM might actually be worth switching from Mamba to it. Can you please make a sequel video to this comparing xLSTM, Mamba, Transformers, Jamba and Mamba-2? I am so desperate to know.

@initialsjd5867 7 ай бұрын

I find this all extremely interesting, but am having a hard time finding the right way into understanding these topics, does anyone have a suggestion on where to start?

@Detril2000 7 ай бұрын

Start by a Neural Networks introduction course. Besides that, currently your only options would be to then study this on your own or enter a Master's program in Computer Science, as all "courses" on LLMs are currently extremely dumbed down and mostly just go over how to type on ChatGPT.

@NicitoStaAna 3 ай бұрын

3b1b is a great start. You're going to need to understand matrix multiplication (okay, enough. It's just sumproduct in excel) Statquest for a great overview/deepdive for any false assumptions

@aykutakguen3498 7 ай бұрын

Interesting, thanks for the le vid

@alkeryn1700 7 ай бұрын

all these cool architecture never actually used on open models.

@pallharaldsson9015 7 ай бұрын

3:33 the first line there is slightly wrong, former sLSTM should be mLSTM apparently (no harm done, since 0), but more importantly the ratio could I guess be e.g. [2:3] similar to hybrid Mamba/transformer models mixing different number of Mamba layers and transformer/attention layers. It should be interesting to know the optimal [x:y] ratio, and even if it makes sens to mix also more e.g. Mamba into this... As seen at 3;::28 none of the ratios seem optimal. mLSTM has d x d matrix and the d could also be tuned also I suppose.

@dominiksvestka1587 3 ай бұрын

So could using only sLSTM blocks be superior to using LSTM blocks for predicting time series?

@Ludifant 7 ай бұрын

We have seen time and again, that modeling stuff on what we see in humans helps. Now we are venturing in to macroscopic structures like memory, where we actually have working theories of how people do it, which can give clues to smart architecture. GPT is brute forcing, LSTM is social engineering. It's not like any is guaranteed to get you in, but it is good to have both tools in the toolbelt or both expertises and viewpoints in your team. As the landscape keeps changing, the winds of fashion favour one or the other.. But when motion pictures came out, they didn't stop painting or sculpting. But when talkies came out, silent movies were left behind. We need to get past this obsession with 'better' and see that as long as they differ enough, these are all valid strategies and the choice of tool depends on the nature of problem. It helps if you understand your tools and your problem.

@ccash3290 7 ай бұрын

Put some clouds in the background or something so people aren't confused by the similar thumbnails to other creators

@gualcasas528 7 ай бұрын

that is the goal, for people to get confused and click on the video

@jonatan01i 7 ай бұрын

outperforms mamba AND TRANSFORMERS?!?!?!

@monad_tcp 7 ай бұрын

nothing outperforms Decepticons thou

@AMA14700 7 ай бұрын

Well at this point companies now should stop training their AIs for some time in order to wit for the best architecture 😂

@w花b 7 ай бұрын

Reminds me of web development where there's always a new best framework and everything becomes "obsolete" every 2 months

@mfpears 7 ай бұрын

I wish I had time to research this stuff myself. This is exactly what I would be researching. Transformers seem limited in some fundamental way to me. I want to see something more recursive and dynamic. But these are impressions only since I haven't had money/time to really dive in yet.

@Alpha_GameDev-wq5cc 7 ай бұрын

LSTMs were all the hype before the Transformers dropped in 2017… Love to see the prodigal son return

@AntoshaPushkin 7 ай бұрын

"xLSTM" sounds so lame. Should have been "2 LS 2 TM" or maybe "LSTM: Tokyo Drift"

@nathanpotter1334 7 ай бұрын

Back from 2017

@tom-et-jerry 7 ай бұрын

Very nice music !

@beyse101 7 ай бұрын

I believe I just got Schmiedhubered

@mawungeteye6609 7 ай бұрын

Up next xlstm-mamba hybrid

@olegpetrov2624 7 ай бұрын

Superior meme taste. Thanks bycloud.

@cdkw2 7 ай бұрын

With enough copium we sure can 🔥🔥🔥

@catoleg 7 ай бұрын

This amount of memes per second remind me of the Bad Gear channel

@SamArmstrong-DrSammyD 7 ай бұрын

Hell yeah, sigmoid FTW!

@TheGiovany82 7 ай бұрын

My brain went out of Ram with this one 😂😂😂😂

@ZenchantLive 7 ай бұрын

Love your B Rolls lol

@anshulsingh8326 7 ай бұрын

Ok, so how to get started with AI development

@BoHorror 7 ай бұрын

Try and predict the stock market

@mrrespected5948 7 ай бұрын

Nice

@Kenneth_James 7 ай бұрын

What happened to the videos that showed examples of where the best AI generated images and video and gave a dummy like me an idea of what could be done with the current 'best'

@JayDee-b5u 7 ай бұрын

A person really can't 'shill' their own thing. That's not the right word.

@andyizawsome 7 ай бұрын

i know some of these words

@renanmonteirobarbosa8129 7 ай бұрын

Hopfield NN is the real GOAT

@cdkw2 7 ай бұрын

The newsletter is fire though bro, thanks for that

@fmmmtmm 7 ай бұрын

What's an "architecture"?

@drdca8263 7 ай бұрын

Structure of the network

@crackwitz 7 ай бұрын

Hoch-rei-ter Hoooch-reeeiii-teeer

@agenticmark 7 ай бұрын

funny. subbed

@butterbee2384 7 ай бұрын

02:40 "beated" lmao

@JhonSabowsla 7 ай бұрын

bro othe memes got me crying 💀😆

@MartinDxt 7 ай бұрын

to whoever thought or is thinking the the ai revolution is dead this is just the tip of the iceberg :D

@prenomnom2686 7 ай бұрын

Pleeeeeease talk about YOCO 😢

@Ludifant 7 ай бұрын

Hoch-reiter means High Rider... Sooo he might be high on copium..

@newbie8051 7 ай бұрын

Fucking hate Language Models, they have single handedly shitted on the entire community making everyone focus on chatbots. Gone are the days when ppl used to showcase their work on some image-data. Meh, ig I just don't like Language Models that much. transformers as an idea is really amazing, images is make intuitive for me, cannot grasp much stuff from "embeddings for words". I'll try xLSTMs as regressors, will definitely make a good projects, thanks for the video buddy 💖💖