you have no idea how relevant this is for me now, I'm currently working on an NLP problem using maml, thanks!
@amitkumarsingh4062 жыл бұрын
interesting. what is it about?
@nDrizza4 жыл бұрын
Awesome explanation! I really like that you took enough time to explain the idea clearly instead of trying to shrink the explanation down to something of 30mins which might have not been understandable.
@aasimbaig014 жыл бұрын
I learn new things everyday from your videos !!
@herp_derpingson4 жыл бұрын
34:50 Mind blown. Great paper. Keep it coming! 39:40 What happens if the matrix is not invertible? Do we just discard that and try again? 41:50 This is kinda like the N-bodies problem but with SGD instead of gravity.
@YannicKilcher4 жыл бұрын
I don't think that matrix is ever non-invertible in practice, because of the identity add. But if so, just take a pseudo inverse or something.
@JackSPk4 жыл бұрын
Really enjoyed this one. Pretty good companion and intuitons for reading the paper (specially the "shacka da bomb" part).
@zikunchen63034 жыл бұрын
daily uploads are amazing, i watch your videos instead of random memes now
@SCIISano3 жыл бұрын
Ty for explaining the Implicit Jaccobian. This was exactly what I was looking for.
@arkasaha44124 жыл бұрын
This is one of your best videos! :)
@YIsTheEarthRound4 жыл бұрын
I'm new to MAML so maybe this is a naive question but I'm not sure I understand the motivation for MAML (versus standard multi-task learning). Why is it a good idea? More specifically, it seems that MAML is doing a multi-scale optimisation (one at the level of training data with \phi and one at the level of validation data with \theta), but why does this help with generalisation? Is there any intuition/theoretical work?
@YannicKilcher4 жыл бұрын
The generalization would be across tasks. I.e. if a new (but similar) task comes along, you have good initial starting weights for fine-tuning that task.
@YIsTheEarthRound4 жыл бұрын
@@YannicKilcher But why does it do better than 'standard' multi-task ML in which you keep the task-agnostic part of the network (from training these other tasks) and retrain the task-specific part for the new task? It seems like there's 2 parts to why MAML does so well -- (1) having learned representations from previous tasks (which the standard multi-task setting also leverages), and (2) using a validation set to learn this task-agnostic part. I was just wondering what role the second played and whether there was some intuition for why it makes sense.
@user-xy7tg7xc1d4 жыл бұрын
Sanket Shah You can check out the new meta learning course by Chelsea Finn kzbin.info/aero/PLoROMvodv4rMC6zfYmnD7UG3LVvwaITY5
@ernestkirstein62334 жыл бұрын
The last step that he wasn't explicit about at 39:13 was that dphi/dtheta + 1/lambda * hessian * dphi/dtheta = Ident so (Ident + 1/lambda * hessian ) dphi/dtheta = Ident so Ident + 1/lambda * hessian is the inverse of dphi/dtheta.
@ernestkirstein62334 жыл бұрын
Another great video Yannic!
@alexanderchebykin64484 жыл бұрын
You've mentioned that first-order MAML doesn't work well - AFAIK that's not true: in the original MAML paper they achieve same (or better) results with it in comparison to the normal MAML (see Table 1, bottom). This also holds for all the independent reproductions on github (or at least the ones I looked at)
@shijizhou53344 жыл бұрын
Thanks for correcting that, I was also confused about this question.
@jonathanballoch3 жыл бұрын
if anything the plots show that FOMMAML *does* work well, but much slower
@leondawn35934 жыл бұрын
very clearly explained! great thanks!
@JTMoustache3 жыл бұрын
I've missed this one before, this juste highlights how useful it is to really master (convex) optimization when you want to be original in ML. Too bad I did not go to nerd school.
@anthonyrepetto34744 жыл бұрын
thank you for the details!
@tusharprakash62353 ай бұрын
In inner loop, for more than one step the gradients should be computed wrt initial parameters right?
@AshishMittal614 жыл бұрын
Great Video! Really helped with the intuition.
@arindamsikdar59613 жыл бұрын
At 36:15 sec of your video derive the whole equation (both sides) w.r.t \phi and not \theta to get the equation 6 in the paper
@S0ULTrinker2 жыл бұрын
How do you backpropagate gradient through previous gradient steps, when you need multiple forward passes to get Theta for each of the K steps? 13:11
@nbrpwng4 жыл бұрын
Nice video, it reminds me of the e-maml paper I think you reviewed some time ago. Have you by chance considered making something like a channel discord server? Maybe it would be a nice thing for viewers to discuss papers or other topics in ML, although these comments sections are good too from what I’ve seen.
@YannicKilcher4 жыл бұрын
Yes my worry is that there's not enough people to sustain that sort of thing.
@nbrpwng4 жыл бұрын
Yannic Kilcher I’m not entirely sure about how many others would join, but I think maybe enough to keep it fairly active, at least enough to be a nice place to talk about papers or whatever sometimes. I’m in a few servers with just a few dozen active members and that seems to be enough for good daily interaction.
@ekjotnanda68323 жыл бұрын
Really good explanation 👍🏻
@tianyuez4 жыл бұрын
Great video!
@andreasv94724 жыл бұрын
Hi, interesting video! what is this parameter theta? is it the weights of the neural nets? or how many neurons there are? or is it something like learning rate, step-size, or something like that?
@YannicKilcher4 жыл бұрын
yes, theta are the weights of the neural nets in this case
@marouanemaachou78754 жыл бұрын
Keep the good job !!
@brojo91523 жыл бұрын
Which software you used to write things along with the paper?
@hiyamghannam19394 жыл бұрын
Hello Thank you so much!! Have you explained the original MAML paper ??
@YannicKilcher4 жыл бұрын
Not yet, unfortunately
@go00o874 жыл бұрын
hm... isn't grad_phi(phi)=dim(phi)? provided phi is a multidimensional vector, it shouldn't be 1. Granted it doesn't matter as it just rescales Lambda and that parameter is arbitrary anyways.
@herp_derpingson4 жыл бұрын
I think you are confusing grad with hessian. grad operaton on a tensor doesnt change its dimensions. For example, if we take phi = [f(x) = x] then grad_x [x] which is equal to [1] or the identity matrix
@freemind.d27143 жыл бұрын
Regularization is like turning the Maximum Likelihood Estimation (MLE) to Maximum A Posteriori (MAP)
@benwaful4 жыл бұрын
isn't this just Reptile but instead using the minimum of ||phi`-theta|| as update, you use it as a regularizer?
@YannicKilcher4 жыл бұрын
sounds plausible, but I have never heard of reptile
@benwaful4 жыл бұрын
@@YannicKilcher arxiv.org/abs/1803.02999
@spaceisawesome14 жыл бұрын
You're working so hard. Please get some sleep or rest! haha
@UncoveredTruths4 жыл бұрын
from experience transfer learning doesn't work nearly as well as people it out to for medical imagery
@wizardOfRobots3 жыл бұрын
please also upload 1080p
@marat614 жыл бұрын
38:36 it seems that there are mistakes in the expression you derived
@YannicKilcher4 жыл бұрын
share them with us :)
@innosolo27 күн бұрын
@YannicKilcher thank you very much for this great video. I have learnt so much from it. Please I would like to have your email for some enquiries. Thanks
@arindamsikdar59613 жыл бұрын
At 36:15 sec of your video derive the whole equation (both sides) w.r.t \phi and not \theta to get the equation 6 in the paper
@arindamsikdar59613 жыл бұрын
At 36:15 sec of your video derive the whole equation (both sides) w.r.t \phi and not \theta to get the equation 6 in the paper
@arindamsikdar59613 жыл бұрын
At 36:15 sec of your video derive the whole equation (both sides) w.r.t \phi and not \theta to get the equation 6 in the paper