Linear Transformers Are Secretly Fast Weight Memory Systems (Machine Learning Paper Explained)

  Рет қаралды 19,117

Yannic Kilcher

Yannic Kilcher

Күн бұрын

Пікірлер: 47
@st33lbird
@st33lbird 3 жыл бұрын
"because love is....I'm becoming like Lex Fridman" -- genius!
@marouanemaachou7875
@marouanemaachou7875 3 жыл бұрын
😂😂😂
@nju415
@nju415 3 жыл бұрын
Hahahaha I was about to comment on the same thing
@kimchi_taco
@kimchi_taco 3 жыл бұрын
What is meaning of life?
@G12GilbertProduction
@G12GilbertProduction 3 жыл бұрын
Friedman is really subjectionised reductionalist guy.
@konghong3885
@konghong3885 3 жыл бұрын
Schmidhuber: I invented transformer
@mgostIH
@mgostIH 3 жыл бұрын
Yannic Kilcher Is (not so) Secretely A Fast Paper Reviews System
@yimingqu2403
@yimingqu2403 3 жыл бұрын
Me a few month ago: I'm not interested in this work so I won't watch it. Me now: Yannic made this so I'll watch it.
@morkovija
@morkovija 3 жыл бұрын
Oh my, first time i recognized the author in the paper =0 Yes, its that Jurgen
@scottmiller2591
@scottmiller2591 3 жыл бұрын
The key, value, and query terminology always felt like someone who knew databases trying to explain something that is not a database. It's like trying to explain welding in terms of nails - it's really more confusing than explanatory.
@АлексейТучак-м4ч
@АлексейТучак-м4ч 3 жыл бұрын
the principle of storing a single piece of data in several entries of a matrix resembles holography holograms store information about light sources in a distributed fashion
@TechyBen
@TechyBen 3 жыл бұрын
I love when mathematicians, logisticians, programmers/computer scientists and physicians find out one thing is the same as another. :)
@saurabhkulshreshtha8953
@saurabhkulshreshtha8953 3 жыл бұрын
I know discovering symmetries! :)
@whosthisguythinkheis
@whosthisguythinkheis 3 жыл бұрын
Love your videos! On the pixelation, you might have some luck turning the PDF into high res images, importing the pages into something like Krita or Gimp which is free and do your writing on top there. (krita has better pen support though I think)
@morkovija
@morkovija 3 жыл бұрын
16:00 Interesting, I was not expecting the concept of arrow of time to appear in ML..
@wqchen3535
@wqchen3535 3 жыл бұрын
Thanks for your tutorial! It's very good! But I have a different opinion about what you said in 44:00-46:00 about the "d_dot=2d_key*niu". I think the paper is right. You can simply compare fai_i1(k) and fai_i3(k) when d_key=2. Although fai_11=fai_23, fai_21=fai33, fai_31=fai13, fai_41=fai_43, which is your main concern, they are different among the order of i in [1,4]. Different i corresponds to different dimension of fai(k). So, niu can be chosen from {1, 2, ……, 2d_key-1}, niu_j and niu_2d_key-j is not the same because the different order of i.
@mdmishfaqahmed5523
@mdmishfaqahmed5523 3 жыл бұрын
paper headings are amazing these days
@LouisChiaki
@LouisChiaki 3 жыл бұрын
The keys sound very similar to what physicists call orthonormal basis. Or, very similar idea used in quantum mechanics.
@etiennetiennetienne
@etiennetiennetienne 3 жыл бұрын
cool video and cool paper! could you provide some intuition (or some link to previous video) why softmax attention can hold exponentially many keys with respect to number of dimensions ? (i think you mentioned it for hopefield networks ?)
@sharannagarajan4089
@sharannagarajan4089 8 ай бұрын
This is amazing! I think underrated
@matterhart
@matterhart 3 жыл бұрын
I think you can load the pdf into python/acrobat and 2x/3x each page size and then save as pdf or image before dropping into one note.
@siyn007
@siyn007 3 жыл бұрын
I would recommend putting the logo and youtube link at the top left or top right since we never try to observe text there
@piratepartyftw
@piratepartyftw 3 жыл бұрын
cool paper, clear explanation
@andres_pq
@andres_pq 3 жыл бұрын
How long back in history does the application of vector products to route information goes? It seems plausible that the key to AGI may already be solved by an obscure paper from the 80-90s.
@Uni1Lab
@Uni1Lab 3 жыл бұрын
Just for activate the algorithmes.
@andres_pq
@andres_pq 3 жыл бұрын
To be honest, I did not quite understood the ML street talk on kernels as well as the videos you make.
@MachineLearningStreetTalk
@MachineLearningStreetTalk 3 жыл бұрын
We will do better 👌🙌
@andres_pq
@andres_pq 3 жыл бұрын
@@MachineLearningStreetTalk You are great guys!
@siddharthbhargava4857
@siddharthbhargava4857 3 жыл бұрын
Thank you for the explanation.
@mathematicalninja2756
@mathematicalninja2756 3 жыл бұрын
I sometimes feel great that I decided to do masters in maths because I was able to understand this
@bhargav7476
@bhargav7476 3 жыл бұрын
I recently picked up 'Mathematics for Machine Learning' by A. Aldo Faisal, I just know calculus at this point. Would I be able to understand maths in this paper after reading that book?
@mathematicalninja2756
@mathematicalninja2756 3 жыл бұрын
@@bhargav7476 I don’t know as I haven’t read that book yet
@bublylybub8743
@bublylybub8743 3 жыл бұрын
this does not require "master" level math to be understood. just saying.
@444haluk
@444haluk 3 жыл бұрын
Yannic be like: I am gonna drop transpose sign to mess up with my audience and butcher the math.
@tho207
@tho207 3 жыл бұрын
awesome explanations as always. why don't you just change apps? try pdf expert
@Erotemic
@Erotemic 3 жыл бұрын
Yannic: It's magic we can just add these outer products to get a database. Me: Wow amazing, how was I never aware of this?! Yannic: oh btw the keys need to be orthogonal. Me: ... so you can only store as many elements as the dimensionality of D... ok... cool, but I'm not nearly as impressed anymore.
@freemind.d2714
@freemind.d2714 3 жыл бұрын
I actually think the autoregressive self-attention mechanism V(softmax(KT)q) is intuitively easy to understand than Fast Weight Memory Systems (V(KT))q, even though they are basically same concept with different formula
@dr.mikeybee
@dr.mikeybee 3 жыл бұрын
LOL! Hi, Lex.
@corgirun7892
@corgirun7892 4 ай бұрын
oh my god
@dr.mikeybee
@dr.mikeybee 3 жыл бұрын
I just watched this vlog on writing out the fast network as long-term memory: kzbin.info/www/bejne/jnPKqKRqfsqDgtk
@integralogic
@integralogic 3 жыл бұрын
Dense Title.
@dimonenka
@dimonenka 3 жыл бұрын
these fast weight memory systems look like glorified hypernetworks
@benibachmann9274
@benibachmann9274 3 жыл бұрын
Yannic „lächerliche Geschwindigkeit“ Kilcher in Hochform!
When you have a very capricious child 😂😘👍
00:16
Like Asiya
Рет қаралды 18 МЛН
Quando eu quero Sushi (sem desperdiçar) 🍣
00:26
Los Wagners
Рет қаралды 15 МЛН
Quilt Challenge, No Skills, Just Luck#Funnyfamily #Partygames #Funny
00:32
Family Games Media
Рет қаралды 55 МЛН
Мен атып көрмегенмін ! | Qalam | 5 серия
25:41
Deep Networks Are Kernel Machines (Paper Explained)
43:04
Yannic Kilcher
Рет қаралды 60 М.
Rethinking Attention with Performers (Paper Explained)
54:39
Yannic Kilcher
Рет қаралды 56 М.
What is mathematical thinking actually like?
9:44
Benjamin Keep, PhD, JD
Рет қаралды 10 М.
When you have a very capricious child 😂😘👍
00:16
Like Asiya
Рет қаралды 18 МЛН