iMAML: Meta-Learning with Implicit Gradients (Paper Explained)

Рет қаралды 23,105

Yannic Kilcher

Күн бұрын

Пікірлер: 52

@waxwingvain 4 жыл бұрын

you have no idea how relevant this is for me now, I'm currently working on an NLP problem using maml, thanks!

@amitkumarsingh406 2 жыл бұрын

interesting. what is it about?

@nDrizza 4 жыл бұрын

Awesome explanation! I really like that you took enough time to explain the idea clearly instead of trying to shrink the explanation down to something of 30mins which might have not been understandable.

@aasimbaig01 4 жыл бұрын

I learn new things everyday from your videos !!

@herp_derpingson 4 жыл бұрын

34:50 Mind blown. Great paper. Keep it coming! 39:40 What happens if the matrix is not invertible? Do we just discard that and try again? 41:50 This is kinda like the N-bodies problem but with SGD instead of gravity.

@YannicKilcher 4 жыл бұрын

I don't think that matrix is ever non-invertible in practice, because of the identity add. But if so, just take a pseudo inverse or something.

@JackSPk 4 жыл бұрын

Really enjoyed this one. Pretty good companion and intuitons for reading the paper (specially the "shacka da bomb" part).

@zikunchen6303 4 жыл бұрын

daily uploads are amazing, i watch your videos instead of random memes now

@SCIISano 3 жыл бұрын

Ty for explaining the Implicit Jaccobian. This was exactly what I was looking for.

@arkasaha4412 4 жыл бұрын

This is one of your best videos! :)

@YIsTheEarthRound 4 жыл бұрын

I'm new to MAML so maybe this is a naive question but I'm not sure I understand the motivation for MAML (versus standard multi-task learning). Why is it a good idea? More specifically, it seems that MAML is doing a multi-scale optimisation (one at the level of training data with \phi and one at the level of validation data with \theta), but why does this help with generalisation? Is there any intuition/theoretical work?

@YannicKilcher 4 жыл бұрын

The generalization would be across tasks. I.e. if a new (but similar) task comes along, you have good initial starting weights for fine-tuning that task.

@YIsTheEarthRound 4 жыл бұрын

@@YannicKilcher But why does it do better than 'standard' multi-task ML in which you keep the task-agnostic part of the network (from training these other tasks) and retrain the task-specific part for the new task? It seems like there's 2 parts to why MAML does so well -- (1) having learned representations from previous tasks (which the standard multi-task setting also leverages), and (2) using a validation set to learn this task-agnostic part. I was just wondering what role the second played and whether there was some intuition for why it makes sense.

@user-xy7tg7xc1d 4 жыл бұрын

Sanket Shah You can check out the new meta learning course by Chelsea Finn kzbin.info/aero/PLoROMvodv4rMC6zfYmnD7UG3LVvwaITY5

@ernestkirstein6233 4 жыл бұрын

The last step that he wasn't explicit about at 39:13 was that dphi/dtheta + 1/lambda * hessian * dphi/dtheta = Ident so (Ident + 1/lambda * hessian ) dphi/dtheta = Ident so Ident + 1/lambda * hessian is the inverse of dphi/dtheta.

@ernestkirstein6233 4 жыл бұрын

Another great video Yannic!

@alexanderchebykin6448 4 жыл бұрын

You've mentioned that first-order MAML doesn't work well - AFAIK that's not true: in the original MAML paper they achieve same (or better) results with it in comparison to the normal MAML (see Table 1, bottom). This also holds for all the independent reproductions on github (or at least the ones I looked at)

@shijizhou5334 4 жыл бұрын

Thanks for correcting that, I was also confused about this question.

@jonathanballoch 3 жыл бұрын

if anything the plots show that FOMMAML *does* work well, but much slower

@leondawn3593 4 жыл бұрын

very clearly explained! great thanks!

@JTMoustache 3 жыл бұрын

I've missed this one before, this juste highlights how useful it is to really master (convex) optimization when you want to be original in ML. Too bad I did not go to nerd school.

@anthonyrepetto3474 4 жыл бұрын

thank you for the details!

@tusharprakash6235 3 ай бұрын

In inner loop, for more than one step the gradients should be computed wrt initial parameters right?

@AshishMittal61 4 жыл бұрын

Great Video! Really helped with the intuition.

@arindamsikdar5961 3 жыл бұрын

At 36:15 sec of your video derive the whole equation (both sides) w.r.t \phi and not \theta to get the equation 6 in the paper

@S0ULTrinker 2 жыл бұрын

How do you backpropagate gradient through previous gradient steps, when you need multiple forward passes to get Theta for each of the K steps? 13:11

@nbrpwng 4 жыл бұрын

Nice video, it reminds me of the e-maml paper I think you reviewed some time ago. Have you by chance considered making something like a channel discord server? Maybe it would be a nice thing for viewers to discuss papers or other topics in ML, although these comments sections are good too from what I’ve seen.

@YannicKilcher 4 жыл бұрын

Yes my worry is that there's not enough people to sustain that sort of thing.

@nbrpwng 4 жыл бұрын

Yannic Kilcher I’m not entirely sure about how many others would join, but I think maybe enough to keep it fairly active, at least enough to be a nice place to talk about papers or whatever sometimes. I’m in a few servers with just a few dozen active members and that seems to be enough for good daily interaction.

@ekjotnanda6832 3 жыл бұрын

Really good explanation 👍🏻

@tianyuez 4 жыл бұрын

Great video!

@andreasv9472 4 жыл бұрын

Hi, interesting video! what is this parameter theta? is it the weights of the neural nets? or how many neurons there are? or is it something like learning rate, step-size, or something like that?

@YannicKilcher 4 жыл бұрын

yes, theta are the weights of the neural nets in this case

@marouanemaachou7875 4 жыл бұрын

Keep the good job !!

@brojo9152 3 жыл бұрын

Which software you used to write things along with the paper?

@hiyamghannam1939 4 жыл бұрын

Hello Thank you so much!! Have you explained the original MAML paper ??

@YannicKilcher 4 жыл бұрын

Not yet, unfortunately

@go00o87 4 жыл бұрын

hm... isn't grad_phi(phi)=dim(phi)? provided phi is a multidimensional vector, it shouldn't be 1. Granted it doesn't matter as it just rescales Lambda and that parameter is arbitrary anyways.

@herp_derpingson 4 жыл бұрын

I think you are confusing grad with hessian. grad operaton on a tensor doesnt change its dimensions. For example, if we take phi = [f(x) = x] then grad_x [x] which is equal to [1] or the identity matrix

@freemind.d2714 3 жыл бұрын

Regularization is like turning the Maximum Likelihood Estimation (MLE) to Maximum A Posteriori (MAP)

@benwaful 4 жыл бұрын

isn't this just Reptile but instead using the minimum of ||phi`-theta|| as update, you use it as a regularizer?

@YannicKilcher 4 жыл бұрын

sounds plausible, but I have never heard of reptile

@benwaful 4 жыл бұрын

@@YannicKilcher arxiv.org/abs/1803.02999

@spaceisawesome1 4 жыл бұрын

You're working so hard. Please get some sleep or rest! haha

@UncoveredTruths 4 жыл бұрын

from experience transfer learning doesn't work nearly as well as people it out to for medical imagery

@wizardOfRobots 3 жыл бұрын

please also upload 1080p

@marat61 4 жыл бұрын

38:36 it seems that there are mistakes in the expression you derived

@YannicKilcher 4 жыл бұрын

share them with us :)

@innosolo 27 күн бұрын

@YannicKilcher thank you very much for this great video. I have learnt so much from it. Please I would like to have your email for some enquiries. Thanks

@arindamsikdar5961 3 жыл бұрын

At 36:15 sec of your video derive the whole equation (both sides) w.r.t \phi and not \theta to get the equation 6 in the paper

@arindamsikdar5961 3 жыл бұрын

At 36:15 sec of your video derive the whole equation (both sides) w.r.t \phi and not \theta to get the equation 6 in the paper

@arindamsikdar5961 3 жыл бұрын

At 36:15 sec of your video derive the whole equation (both sides) w.r.t \phi and not \theta to get the equation 6 in the paper