LLM Attention That Expands At Inference? Test Time Training Explained

Рет қаралды 47,490

Күн бұрын

Take your personal data back with Incogni! Use code bycloud at the link below and get 60% off an annual plan: incogni.com/by...
RNN's hidden states be like: "You know, I am something of an ML model myself"
check out my newsletter:
mail.bycloud.ai/
Learning to (Learn at Test Time): RNNs with Expressive Hidden States
[Paper] arxiv.org/abs/...
[Code PyTorch] github.com/tes...
[Code JAX] github.com/tes...
This video is supported by the kind Patrons & KZbin Members:
🙏Andrew Lescelius, alex j, Chris LeDoux, Alex Maurice, Miguilim, Deagan, FiFaŁ, Robert Zawiasa, Daddy Wen, Tony Jimenez, Panther Modern, Jake Disco, Demilson Quintao, Shuhong Chen, Hongbo Men, happi nyuu nyaa, Carol Lo, Mose Sakashita, Miguel, Bandera, Gennaro Schiano, gunwoo, Ravid Freedman, Mert Seftali, Mrityunjay, Richárd Nagyfi, Timo Steiner, Henrik G Sundt, projectAnthony, Brigham Hall, Kyle Hudson, Kalila, Jef Come, Jvari Williams, Tien Tien, BIll Mangrum, owned, Janne Kytölä, SO, Richárd Nagyfi, Hector, Drexon, Claxvii 177th, Inferencer, Michael Brenner, Akkusativ, Oleg Wock, FantomBloth
[Discord] / discord
[Twitter] / bycloudai
[Patreon] / bycloud
[Music] Massobeats - Noon
[Profile & Banner Art] / pygm7
[Video Editor] Silas

Пікірлер: 156

@bycloudAI 6 ай бұрын

Take your personal data back with Incogni! Use code bycloud at the link below and get 60% off an annual plan: incogni.com/bycloud maybe we are all bots and the dead internet theory is true

@mine.moment 6 ай бұрын

Please create your own style of thumbnails and stop trying to mimick Fireship lol... I'm being honest rn but the contents in your videos don't feel as interesting/ funny/ easy-to-understand as his. Hope you take that as constructive criticism because you do cover lots of cool topics that Fireship doesn't.

@Miaumiau3333 6 ай бұрын

I disagree with the comments complaining that the video is too technical. I really like that you provide enough detail to roughly understand the technique, awesome video!

@FireOfGott 6 ай бұрын

Agreed, this is very approachable to someone who knows some architecture fundamentals!

@seriousbusiness2293 6 ай бұрын

I found the pacing a bit off. In general very well eddited and summarized information. But its hard to keep track with all the vocabulary, personally id ether need to linger longer on these details or get an even shorter overview on those aspects. I really like the style of Yannic Kilcher Paper reviews but his videos are also 3 times as long, so in any case its a tradeoff what one prefers.

@xClairy 6 ай бұрын

@@seriousbusiness2293Honestly, I feel like it's because his target audience was different, and now it's more technical, so he'd need more time to explain those concepts instead of expecting a baseline understanding. But going more in detail would scale logarithmically with video length, which would also hurt his YT channel, considering we all expect at best 5~15 minute videos from this channel. So, yea, it's a trade-off.

@w花b 6 ай бұрын

Yeah they might as well just watch Fireship because that's what they're asking.

@mine.moment 6 ай бұрын

@@w花b But the problem is that bycloud tries to mimick Fireship's thumbnail style to lure in Fireship viewers then throw them off with 10+ minutes videos of overly technical stuffs, who prefer ~5 mins of mixed interesting, meme-y, simplified contents instead.

@MrJaggy123 6 ай бұрын

Turn all the hidden states into ML models? That scream of pain you all just heard was from the interpretability researchers ;)

@QuantumConundrum 6 ай бұрын

OK, but then their employment is secured forever LOL

@CurioCity-Curiosity 6 ай бұрын

We need black boxes for the black boxes!

@koktszfung 6 ай бұрын

But imagine if those ML models are CNNs and you can see how the kernel adapt to the context of the input in real time, wouldn't that be actually easier to interpret?

@naumbtothepaine0 6 ай бұрын

@@koktszfung CNNs are more like DL, ML models are simpler

@revimfadli4666 5 ай бұрын

@@naumbtothepaine0which simpler ML models? XGBoost? SVM? Because CNNs are ML models too

@Eianex 6 ай бұрын

In conclusion, Trouble in Terrorist Town is cooler than some transformers and some snakes.

@manuelburghartz5263 6 ай бұрын

This channel explaining AI and using anime references in the visuals is exactly what I needed. Great video!

@heavenrvne888 6 ай бұрын

holy shit this method is so interesting. and the way they encapsulated the entire idea into the title LOL!

@karlkastor 6 ай бұрын

I love that you tell us how the method in the paper roughly works. A lot of KZbin channels just say this new technique is better without any explanation and just show results, so I have to skim the paper to get the gist of it.

@krollo8953 5 ай бұрын

Yup makes you feel like you're learning something rather than information without enough context

@FunBotan 6 ай бұрын

I would never suspect that this video would help me write my PhD, but the "compression heuristic" is exactly the term I needed but didn't know to express my idea. Thank you!

@papakamirneron2514 6 ай бұрын

Please make a video explaining all of these terms, apart from that, keep the technical videos coming!

@flamakespark 6 ай бұрын

Another day, another attempt to re-invent LSTMs

@babyjvadakkan5300 6 ай бұрын

Whats that now?

@zyansheep 6 ай бұрын

@@babyjvadakkan5300type of rnn that google used to use (or still does?) for language translation before we got transformers

@Bencurlis 6 ай бұрын

It is more of a generalization of both LSTMs and Attention, it is theoretically much more powerful IMO

@keypey8256 6 ай бұрын

It's definitely an interesting idea

@mikairu2944 6 ай бұрын

"too technical for this video" man you lost me at the thumbnail

@cdkw2 6 ай бұрын

me to bro yet I still watch he entire video 💀

@Dannydrinkbottom 6 ай бұрын

My brother speaking greek

@OumarDicko-c5i 6 ай бұрын

As an IA, thanks you for teaching me this, i will use it to train myself

@ginqus 6 ай бұрын

intelligently artificial

@IN-pr3lw 6 ай бұрын

@@ginqusinteligencia artificial

@truongao5425 6 ай бұрын

intelligent anti-africa

@TheRealUsername 5 ай бұрын

@@truongao5425😂 troll

@divandrey-u3q 6 ай бұрын

As always, thank you for the video! I really appreciate the amount of technical details here. Don't know why other people complain but I love it!

@FaultyTwo 6 ай бұрын

"Mom! They are adding more weights to the models again!"

@OperationDarkside 6 ай бұрын

Let's put transformers into transformers. Maybe we end up with baby transformers.

@revimfadli4666 5 ай бұрын

Ah yes, hot transformers in transformers action

@TheNewton 6 ай бұрын

Good short dense overview of an even super denser subject matter. Still waiting for the paper that modularizes all these component processes and flows then runs training against all the permutations to bootstrap itself.

@DarrenReidAu 6 ай бұрын

It’s trainable models all the way down! Great video, thanks!

@cdkw2 6 ай бұрын

2:32 Waiting for bycloud to be on that page like others!

@athul_c1375 6 ай бұрын

It's some mamba jamba

@marshallodom1388 6 ай бұрын

I got up to 6 minutes and loved the ride! Gonna have a snack and p and dive right back in!

@QuantumConundrum 6 ай бұрын

More videos like this, please.

@heavenrvne888 6 ай бұрын

that intro was amazing

@ismailnejjar 6 ай бұрын

Love the video!!

@XenoCrimson-uv8uz 6 ай бұрын

How do we know the ones complaining about the bots in youtube chat aren't bots themselves?

@Alice_Fumo 6 ай бұрын

I have definitely seen bots complain about bots before. In fact, you could also be a bot. Who knows at this point?

@picmotion442 6 ай бұрын

I might be a bot

@leftybot7846 6 ай бұрын

I'm definitley not a bot, what a stupid idea.

@turgor127 6 ай бұрын

Ban both then. Spamming is annoying ether way.

@Cloudruler_ 6 ай бұрын

The interesting thing is it's probably cheaper for a bot to spam "bot" than create LLM comments.

@HarperChisari 6 ай бұрын

TTT is literally short term memory. Wild.

@SimGunther 6 ай бұрын

Audience: Less reading, more technical content! Also audience: AAAAAAHH, MY EYES! TOO TECHNICAL FOR MY EYES AND EARS! 😢

@sh4ny1 5 ай бұрын

4:11 Why not use wavelet transform for this? I think it would be useful here since

@fnytnqsladcgqlefzcqxlzlcgj9220 6 ай бұрын

Perfect amount of complexity, please do not make your longer videos like this more simple, im not involved in any form of computer science but ive kept up with ai since tensor flow was brand new and i understood almost everything first try

@JorgetePanete 6 ай бұрын

6:08 it resembles Trouble in Terrorist Town

@spoonikle 6 ай бұрын

Earth shattering.

@guilhermecastro3671 6 ай бұрын

Cool video, for a beginner all these terms together seem very technical, can someone suggest a playlist to learn more in depth about these topics ?

@jasonhemphill8525 6 ай бұрын

What part are you struggling with?

@StefanReich 6 ай бұрын

Super well explained. And full of memes

@Ryu-ix8qs 6 ай бұрын

Good video, thanks

@registered_dodo 6 ай бұрын

I love words.

@fra4897 6 ай бұрын

great video but transformers in practices do not have quadratic complexity, only if u implement it in the vanilla way

@jondo7680 6 ай бұрын

I want a TTT-Linear (T) with TTT-MLP (M) as it's inner loop.

@quocanhnguyen7275 6 ай бұрын

I tried to read this when u wrote about in your newsletter. But it was not an easy paper

@samarthpatel8377 6 ай бұрын

Sooooo many bot comments!

@bolon667 6 ай бұрын

Putting innocent comments to change it into ads later

@samarthpatel8377 6 ай бұрын

@@bolon667 I think you are right. The comments which I noticed earlier have gone?

@dhillaz 6 ай бұрын

I know some of these words!

@flinkstiff 6 ай бұрын

Bumblebee is my favorite

@koktszfung 6 ай бұрын

Wouldn't this model be slow in operation if it has to train on the context?

@Vagabundo96 6 ай бұрын

This is crazy

@David-lp3qy 6 ай бұрын

MAMBA IF YOU CAN HEAR ME PLEASE SAVE US

@bloomp7999 6 ай бұрын

did i understand this

@CraftMine1000 6 ай бұрын

Training on test data,,, unless I severely miss-understand this I'm just going say; "jikes, nope, get out, and don't come back"

@noctarin1516 6 ай бұрын

Nahh they actually cooking with this architecture though

@narpwa 6 ай бұрын

my brain is exploding send help

@BooleanDisorder 6 ай бұрын

Next up is cisformers

@amafuji 6 ай бұрын

detransformers

@ginqus 6 ай бұрын

biformers

@anywallsocket 6 ай бұрын

formers

@BooleanDisorder 6 ай бұрын

@@anywallsocket forms

@sarveshpadav2881 6 ай бұрын

performers?

@krollo8953 5 ай бұрын

Lol thats an intense amount of memeage

@LukasNitzsche 6 ай бұрын

Does this relate in anyway to liquid time constant neural networks?

@simonesborrinpz 6 ай бұрын

good videos👍

@scientificaly_restful_one 6 ай бұрын

Well, some year ago or so I had thoughts about going into ML, but you have lost me on this one. 👍 I guess it's only gonna get more complicated from now on.

@kamilbxl6 6 ай бұрын

Nowadays its easier to learn ML than ever. You should start with something simple enough that you understand around 80% and only actually doing 20% as smth new. There are lots of free shared classes like MIT, Stanford etc.. lots of tutorials, examples, code documentation. First get a general yet simple view bout NN, then chose what you'd like to specialize: image recognition, text or smth else

@ricardocosta9336 6 ай бұрын

Dude no kidding, I came up with something similar a month ago. In concept. I'm afraid I have a limited num of insights in my life time. And without timento persue them I will never make any diference in the world. 😢. But hey that also proves, to me at least, that my math intuition is on point. 😅

@ccash3290 6 ай бұрын

A lot of people have zero insights. Its important to work on your ideas to test them in reality

@anywallsocket 6 ай бұрын

If you thought of it other people thought of it or will, so don’t worry about not being the one who gets credit, what matters is that the idea is in the memosphere

@someonetrustme161 6 ай бұрын

so nobody gonna talk about how we just got rickrolled? at 3:43

@Acceleratedpayloads 6 ай бұрын

This looks block recurrent transformers by DL Hutchens

@Dom-zy1qy 6 ай бұрын

Whenever a new architecture takes over, the tech companies heavily invested into developing hardware specifically optimized for the transformer architecture are gonna be sad.

@DanielJoyce 6 ай бұрын

A single brain neuron needs something like 5 layers or so to encode its behavior. So this kinda maps each node now to somethibg like a neuron. I know biological features map poorly to neural nets but neurons in the brain change how and when they fire as the brain learns.

@bobsoup2319 6 ай бұрын

Bro this model is too complicated to be simplified more. Keep up the complexity it’s what makes it interestijg

@anywallsocket 6 ай бұрын

Wouldn’t that take forever to train??

@Wobbothe3rd 6 ай бұрын

The human brain is a recurrent neural network, not a transformer. Eventually, recurrent will win.

@athul_c1375 6 ай бұрын

But who said the human brain is better than the transformer

@-mwolf 6 ай бұрын

tell me the current paradigm is hitting a dead end without telling me

@ONDANOTA 6 ай бұрын

why is every llm's OUTPUT context window fixed to 4096?

@geli95us 6 ай бұрын

AFAIK, output context windows are not a thing for the models themselves, the model is just called once for every token it has to generate, you can perform that process a million times if you want, however, it's not useful if the LLM outputs text up to a point where its prompt gets out of its context window, so in the early days the "output window" was just set to whatever the model's context window was, nowadays, it's probably capped for economic reasons, LLMs get more expensive the longer the input is, so by limiting the output window, they force you to pay for tokens several times, once as the model's output, and subsequent times as input to the next outputs

@spoonikle 6 ай бұрын

To stop it. While still giving enough space to make “satisfying” answers.

@4.0.4 6 ай бұрын

It seems very convoluted, but I guess it should learn with less data? That could be good for startups that don't have big datasets.

@pladselsker8340 6 ай бұрын

Imagine giving money to a service for a sense of security because it is now the status quo to let every substential company out there infringe on your privacy rights. Just a thought. What parallel universe is this?

@FenrirRobu 6 ай бұрын

Tho didn't they warn us against meta-optimizers due to the alignment becoming impossible?

@jymcaballero5748 5 ай бұрын

just give them more memory!

@PhilsArtVibes 6 ай бұрын

No, no, no, I do not want to add neural networks to recursion, I JUST BEGAN TO UNDERSTAND RECURSION DON'T DO THIS TO ME!!!

@Originalimoc 2 ай бұрын

That's unexpected

@algorithmblessedboy4831 6 ай бұрын

guys I'm in high school and I'm trying to choose a career path. my no.1 choice considering the things I like and that I'm good at is becoming an AI reaearcher, can anyone in the academic world tell me if it would be a fun job or not?

@user-vg2ui3wg8n 6 ай бұрын

It definitely is. But the field is getting increasingly complex, fast-paced, and hyper-competitive. I'd recommend studying computer science and mathematics, since you will not be able to compete in this field without a very strong mathematical background. Except for that, go for it. I'm a researcher in parallel processing and numerical high-performance computing. It is definitely fun and rewarding, but be prepared for a painful journey.

@pmosg9649 6 ай бұрын

很棒😀

@notnotandrew 6 ай бұрын

Yo dawg, I heard you like ML models...

@donson3326 6 ай бұрын

Short answer: no

@pedrogorilla483 6 ай бұрын

I watched half of the video and this is too technical for me. I’m skipping this one. Congrats to everyone who understands this video!

@bycloudAI 6 ай бұрын

it's like RNN's hidden states are just ML models thanks for watching till half way tho

@MuhammadakbarAK47 6 ай бұрын

Just watch it 3 times

@sashank224 6 ай бұрын

@bycloudAI ili bro I explain hold up, I'm getting what hes saying. You need break it down in simple terms that relate to real world apps. Visualize.

@homeyworkey 6 ай бұрын

@@bycloudAI btw this was posted on r/singularity where there are more normies - obv u need normies if you want growth though, but any technical video is automatically going to have a very niche audience understandably so, so you probably dont mind that aswell. i mean i watch ur stuff and most of it goes over my head but interesting regardless, but just letting u know the feedback here is kind of skewed.

@Guedez1 6 ай бұрын

Yeah, if you made up everything you said in the video I wouldn't be able to tell at all. Stuff is getting harder and harder to understand.

@falsechord 6 ай бұрын

fractal ai models

@GodbornNoven 6 ай бұрын

Nice explanations but go easy on the vocabulary. I don't reckon every joe out there knows will understand all the terms. The pacing is too quick too.

@pauljones9150 6 ай бұрын

I'm here for the waifu memes Good video tho

@multipurposepaperbox 6 ай бұрын

damn yeah that's AI stuff right hahaaa? tbh I understand a quarter of this, but I really enjoy a lot of your videos

@themultiverse5447 6 ай бұрын

what?

@boricuaxflow9669 6 ай бұрын

Are we all botted comments?

@kingki1953 6 ай бұрын

You should consider to ban bot in your channel.

@kingki1953 6 ай бұрын

You just upload and 3 bots already comment, dark internet is scary 😢

@StefanReich 6 ай бұрын

@@kingki1953 Actually dark internet is really lame right now. You can spot these comments from a mile away Your videos are always so informative and interesting! Thank you for that! Thank you for your work! Your videos are always top notch! Always a pleasure to watch your videos! I will be looking forward to new episodes!

@punk3900 6 ай бұрын

I hate such advertisment shockers that are not separated adwautely form the main material. Not gonna subscribe to a channel that does that.😢

@muscifede 6 ай бұрын

look at the amount of bots lol

@StefanReich 6 ай бұрын

This is nothing. Check out any popular video about trading

@09jake12 5 ай бұрын

leenear

@AlphaProto 6 ай бұрын

This video was too much for me.

@mariusj.2192 6 ай бұрын

The quadratic complexity is not the main problem of current LLMs. It's that they are dog sh*t at reasoning (and tasks that depend on it) and a better scaling with context length won't solve that.