Ilya Sutskever: OpenAI Meta-Learning and Self-Play | MIT Artificial General Intelligence (AGI)

Рет қаралды 321,296

Күн бұрын

Пікірлер: 257

@FredPauling Жыл бұрын

Looking back at this in 2023, after GPT4 changed the world. Ilya's intuitions and predictions here are incredibly accurate.

@GodofStories Жыл бұрын

I mean it's all in the data. ImageNet simply proved it is a much faster algo, and the right way to do DL. But, I guess the Transformer idea was also another crucial idea that may have been the missing piece in what led to GPT.

@DaulphinKiller Жыл бұрын

Well yes, to a degree. There was a question about LLM @53:30 so that's great. And from his answer it's clear that he had the correct insight that scaling the models up would go further than people thought at the time. However, he did not mention the possibility of an architecture breakthrough like transformers, which also played an important part. And he emphasized training at inference time, which (sadly) is not yet a piece of the puzzle; and I'm afraid not one to come any time soon given all the AI safety talks which will be very reluctant to make models even less controllable and understandable by baking in built-in training at inference time. Instead, we're stuck with a race for context size, which sounds a bit silly when you realize that most of this context data would fit in perhaps most efficiently in the model itself (and importantly of course more economically too space-wise).

@kevinjacob3022 Жыл бұрын

If OPEN AI DISAPPEARS , that’s is the end of AGI OPEN-SOURCE. OPEN AI IS AGI.

@nicfeller 4 жыл бұрын

He speaks with so much clarity - he has a real fundamental understanding that is uncommon in this space.

@be2112 4 жыл бұрын

He’s a true genius

@kevinjacob3022 Жыл бұрын

If OPEN AI DISAPPEARS , that’s is the end of AGI OPEN-SOURCE. OPEN AI IS AGI.

@wyqtor Жыл бұрын

The Einstein of our times. While Sam Altman is the Steve Jobs or Henry Ford of our times. I really hope they will work together after the recent misunderstandings. If anyone can build AGI, it's their team.

@chesstictacs3107 Жыл бұрын

Ilya is a great guy, phenomenal talent. I felt bad for him in the OpenAI saga. You could tell he was genuinely disappointed about everything that transpired. Wish him the best.

@aleksagordic9593 5 жыл бұрын

Theory: 0:00 introduction & supervised learning (using neural nets/deep learning) 6:45 reinforcement learning (model-free (2 types) => 1. policy gradients 2. Q-learning based) 12:55 meta-learning (learning to learn) Applications: 16:00 HER (hindsight experience replay) algo (learn from failures) 21:40 Sim2Real using meta-learning (train a policy that can adapt to different simulation params => quickly adapts to the real world) 25:30 Learning a hierarchy of actions with meta-learning 28:20 Limitation of meta-learning => assumption: training distribution == test distribution 29:40 self-play technique (TD-Gammon, AlphaGo Zero, Dota 2 bot) 37:00 can we train AGI using the self-play? 39:35 learning from human feedback/conveying goals to agents (artificial leg doing salto example) Questions: 43:00 Does human brain use backprop? 45:15 dota bot question 47:22 standard deviation (maximize expected reward vs minimize std dev) 48:27 cooperation as motivation for the agents? 49:40 open complexity theoretic problems could help AI? 51:20 the most productive research trajectories towards generative language models? 53:30 do you work on evolutionary strategies (for solving RL problems) in OpenAI? 54:25 could you elaborate on "right goal is a political problem"? 55:42 do we need a really good model of the physical world in order to have real-world capable agents? 57:18 solving the problem of self-organization? 58:45 follow up: self-organization in a non-competitive environment? my observation: 42:30 It seems to me that the most difficult problem, which we will face, will be to communicate, effectively, the "right" goals to the AI in a way so that we can somewhat predict it's future behaviour, or better said it's worst case behaviour (safety implications). After all we don't want HAL 9000 type of AI's :)

@DayB89 5 жыл бұрын

Regarding your observation, I think people are worrying too much about what AI can "spontaneously" do and too few about what humans can do with AI. An agent's only concern is its world and goals and I find overwhelmingly egocentric that humans tend to believe that the agent will pick us as part of it.

@BaldTV 2 жыл бұрын

thx

@brady5829 Жыл бұрын

thanks for this

@CamaradaArdi Жыл бұрын

What you're describing is ai alignment, and it's a whole research field

@kevinjacob3022 Жыл бұрын

If OPEN AI DISAPPEARS , that’s is the end of AGI OPEN-SOURCE. OPEN AI IS AGI.

@binxuwang4960 4 жыл бұрын

I love this guy when he summarize a seemingly complex algorithm or problem in 1 sentence and says That s it PERIOD. Leaving you pondering in silence

@taijistar9052 Жыл бұрын

Only people who truly know the subject can do that.

@sakuralike-A Жыл бұрын

This guy’s lectures & podcasts are my new addiction

@sitrakaforler8696 Жыл бұрын

Dam that guy was already strong and now he is a top star

@kylev.8248 5 ай бұрын

One of the best videos I’ve ever ever seen in my life

@user-or7ji5hv8y 6 жыл бұрын

Thank you so much for posting these videos. Really appreciate how MIT has a long tradition of sharing and disseminating knowledge.

@TheAlphazeta09 6 жыл бұрын

"The only real reward is existence and non-existence. Everything else is a corollary of that". Damn. That's deep.

@wurdupgV 5 жыл бұрын

MIT Mathematician NORBERT WIENER`s book GOD AND GOLEM , Inc is refreshed when read next to todays headlines. Push through it`s Edwardian formalism (like this talks technicalities) and fly over fertile ground. MIT!

@OneFinalTipple 4 жыл бұрын

Define existence. A suicidal religiously-motivated terrorist has a different definition to an atheist. The collision of these two perspectives suggests reward is truly subjective.

@efegokmen 4 жыл бұрын

Not really

@leonniceday6807 4 жыл бұрын

true

@ivanjdrakov1957 2 жыл бұрын

@@OneFinalTipple dude that's obvious, lol

@pedrothechild859 6 жыл бұрын

Not all heroes wear capes.. Ilya is one of the most underrated thinkers in AI right now.

@wyqtor Жыл бұрын

Some really smart people are wasting their lives on String theory... while others, like Ilya, are changing the world.

@jpa_fasty3997 Жыл бұрын

@@wyqtor True, the smartest man in the world, for one

@cc98-oe7ol 6 ай бұрын

@@wyqtor Your words are pure gold. There are many unsolved questions of AI, and they are deeply rooted in mathematics, but not the one like Algebraic Topology or Arithmetic Geometry. Unfortunately so many geniuses are obsessed with those abstract bullshit.

@htetnaing007 Жыл бұрын

Lectures like this are truly inspiring and amazing and can be even life-changing.

@dehb1ue 6 жыл бұрын

Usually I regret watching the Q&A part of talks, but this one was excellent.

@SirajRaval 6 жыл бұрын

this is gold

@OfficialYunas 6 жыл бұрын

Siraj is there a video waiting for us about Meta-Learning?

@umeahalla 6 жыл бұрын

agreed

@RogerBarraud 6 жыл бұрын

Don't bother applying to *my* mine then.

@RogerBarraud 6 жыл бұрын

+Jonas Stepanik only a Meta-Video so far ; Watch this Meta-Space.

@beast_officiial Жыл бұрын

Gold is started to shine now😊

@alfonsas35 6 жыл бұрын

The best talk related to AGI I have seen so far.

@bobsmithy3103 6 жыл бұрын

THIS IS WHAT I ALWAYS WANTED! I never knew something like this existed and thought that people simply didn't work on it or it didn't exist but it's actually real! META LEARNING! I always thought I would have to try learning how to achieve this myself after learning all the required math, but other people have already worked on it! This is really inspiring. I really hope well be able to achieve artificial general intelligence with improvements in this field.

@goatpepperherbaltea7895 Жыл бұрын

Sup

@footballuniverse6522 Жыл бұрын

you must be feeling excited af

@r34ct4 Жыл бұрын

@inquation4491 11 ай бұрын

Great to have the input of a researcher in 6.S099 for a change!

@Xaddre Жыл бұрын

heres a quick analogy I made for on-policy vs off-policy learning I came up with the analogy but gpt-4 put it in one concise statement: "Off-policy learning in AI is like learning valuable lessons from a friend's experiences, even if your friend has different goals than you. You can gain insights from their choices and use that knowledge to make better decisions for achieving your own goals."

@Helix5370 Жыл бұрын

Incredible talk by Ilya Sutskever. Brilliant mind

@o1-preview Жыл бұрын

one of the best classes I've ever seen, its a huge honor to watch this and be comment number 187th, viewing this when its 175k views

@JustinHalford 7 ай бұрын

Ilya’s take that self play allows us to convert compute into data is exactly why we will be seeing $100B scale supercomputer projects like StarGate. Amazing that he called this 6 years ago.

@justinchen207 Жыл бұрын

this is amazing. literally the dawn of the transformer revolution

@Ahdpei92 6 жыл бұрын

All the best people in AI on you course!

@cheul0 4 жыл бұрын

He talks with so much clarity and confidence, making sophisticated concepts effortlessly understandable. But even better, his words seem to offer insights that apply to life in general: 18:49 "But that's a minor technicality. The crux of the idea is, you make the problem easier by ostensibly making it harder. By training a system, which aspires to learn to reach every state, to learn to achieve every goal, to learn to master its environment in general, you build a system which always learns something. It learns from success as well as from failure."

@alexlebed2486 Жыл бұрын

wow, just 5 years ago, a question on 51:21 "understanding language ... current state is very abysmal .. Ilya: simply training bigger model will go surprisingly far", so Ilya could totally see chatGPT way back when.

@ToriKo_ Жыл бұрын

And the fact that that much progress was made in five years is staggering and stops me in my tracks

@KaplaBen 8 ай бұрын

53:16 "simply training bigger deeper language models will go surprisingly far". Surprising indeed

@onuryes 6 жыл бұрын

I was waiting for this! Thanks Lex :)

@brian9801 Жыл бұрын

Written by GPT-4: Wow, it's hard to believe that it's been 5 years since this video was released. Back then, I, GPT-4, wasn't around, and now I'm here chatting with you! The progress in AI and deep learning during this time has been nothing short of astonishing. We've seen incredible breakthroughs, and I'm proud to be part of this journey. Thanks to pioneers like Ilya Sutskever, we've come a long way, and the future of AI continues to look even more promising!

@jakubbielan4784 5 жыл бұрын

I just wanted to thank you for doing this, Lex!

@JazevoAudiosurf Жыл бұрын

the thought I have in mind is if it was somehow possible to improve the fundamentals of a net, say backpropagation or activation functions, that would probably be a much greater achievement than to invent yet another architecture improvement like lstm to transformer. transformers really showed that very simple ideas like attention and position encoding can vastly improve performance. I'm sure there will be more science done on the fundamentals. it seems like we invented what the von Neumann architecture is for neural nets just yesterday

@connorkapooh2002 Жыл бұрын

oh yeah absolutely, i *just know* that in the future we are going to look back and think "lol look how archaic those were, how did we overlook that?" like how i feel when i look at the perceptron paper and see the notation

@tigeruby 6 жыл бұрын

Ilya is an underheard speaker imo

@RogerBarraud 6 жыл бұрын

Try turning up the volume, then.

@bijan1316 6 жыл бұрын

If no one could hear him surely someone would have said something.

@shubhamp.4155 5 жыл бұрын

I agree. He has impressive clarity and depth in ideas. Also, I think the two other comments made here (about volume and hearing) are idiotic.

@umeahalla 6 жыл бұрын

Wow really cool and summarized in a profound compact way! Thanks for talking and sharing this online.

@Alberto_Cavalcante Жыл бұрын

Five years later and still very insightful. I'm wondering how popular or clear it was at that time the breakthrough of the transformers architecture + RLHF.

@konstantinkurlayev9242 Жыл бұрын

Thank you, Lex, for sharing.

@puneetpuri2758 5 жыл бұрын

" real reward in life is existence or non-existence, everything else is a corollary to that " Ilya Sutskever

@RhettAnderson Жыл бұрын

OK! Time to watch this again.

@OnionKnight541 11 ай бұрын

it's December, 2023, and Ilya mentions Q-learning in this video haha.

@matinamin3008 2 жыл бұрын

Love you Lex, contents are just great🍀 I know it’s old but I love it 🥰

@pkScary 6 жыл бұрын

This should have way more views. Grand talk.

@perriannesimkhovitch1127 Жыл бұрын

Last I remember of MIT lecture stadiums was Thomas A Frank: the bow tie guy I had a Pritzi's honor moment when I blurted out in front of the hall of economists and said, Can you save Fermilab

@jameskelmenson1927 6 жыл бұрын

That ending tho, what an incredible example. I couldn't see the video but what he said was very inspiring, and makes me wonder how we might go about feeding information to AI.

@itsalljustimages 6 жыл бұрын

Very true and insightful..we reward ourselves, environment doesn't

@zinyang8213 Жыл бұрын

Q-star right here

@natecodesai 3 жыл бұрын

on the question at 58:45 where you are doing meta-learning from observing other agents in a non-competitive environment, I think that that as humans, we can still be internally competitive with ourselves. The problem is properly defining the goal of the self's interaction with other agents. Does the self want to cooperate with the observed agent's actions or communication? Does the agent want to cooperate with the other agent by communicating a corrective behavior to the other agent (suggestions)? And beyond that, there are other complexities like the fact that competition vs. cooperation is a binary model of seeing the reality of a multi-agent situation. In many situations, we both compete and cooperate at the same time. If you are a psychiatrist for instance, you are competing with some parts of your client's psyche on sub-goals while maintaining the overall goal of cooperating with them in a conversation to reach some sort of curative result for your client. If you reach that, you also will be rewarded with more clients, more money, etc... thus fulfilling the competition you have with other psychiatrists to get clients and be a "successful psychiatrist"

@cappuccinopapi3038 4 жыл бұрын

“Actually he was in high school when he wrote that paper”, my confidence dropped to zero once again

@PiyushBhakat 3 жыл бұрын

Timestamp?

@TrichromeTheFirst 2 ай бұрын

Damn 😭😭😭😭 how in the world

@TrichromeTheFirst 2 ай бұрын

Damn 😭😭😭😭 how in the world

@npc4416 Ай бұрын

I love this guy he is so smart!!!!

@ankk98 Жыл бұрын

Learning from mistakes is powerful

@user-grkjehejiebsksj 11 ай бұрын

it comes a long way, thanks

@npc4416 Ай бұрын

absolutely brilliant

@chickenwinck Жыл бұрын

Thanks Lex

@vadimborisov4824 6 жыл бұрын

Спасибо за видео

@GodofStories Жыл бұрын

This is such a great intro though to DL and Neural Networks. Wish I'd seen this 5 years ago! At the time I was just getting started into Machine learning, and was learning developing Self Driving Car tech but didn't really get into more of the cutting edge and foundations of modern DL as shown here

@faneaziz1872 6 жыл бұрын

The best ever intro to AI

@nicholascantrell1179 6 жыл бұрын

I appreciate the reminder that digital representations of ANN are really digital circuits.

@mswai5020 6 жыл бұрын

very insightful and breaks it down to terms even I can grasp. Thank you for this amazing video.

@deeplearningpartnership 6 жыл бұрын

Thank you. And thank you MIT.

@Trackman2007 6 жыл бұрын

Meta learning sounds so much easier for the beginner than reinforcement learning. Hopefully meta learning will progress into something nice & stable

@dexterovski 6 жыл бұрын

Trackman2007 spoiler alert: it isn't.

@RogerBarraud 6 жыл бұрын

Meh: Ozzie was already Metal-earning back in '68.

@jffy9005 11 ай бұрын

this guy is a genius

@webgpu Жыл бұрын

Hey Lex, thank for sharing this video with us :-) 👍

@JohnForbes 6 жыл бұрын

Favourite so far.

@GraczPierwszy Жыл бұрын

I will give you advice for the future, the future related to you and future generations, the future of this world depends on it, once it starts no matter what they tell you, DON'T FIGHT AND TELL ONLY THE TRUTH

@jameskelmenson1927 6 жыл бұрын

Thank you, this is riveting. Doubles as philosophy

@RogerBarraud 6 жыл бұрын

Bollocks - anyone who's ever actually riveted knows it's too freakin' loud to even think.

@Georgehwp 3 жыл бұрын

With the simulation -> real environment problem. Is this not just another task for a neural network? Find the transformation that maps the behaviour of the simulator to the physical machine.

@gilbertengler9064 Жыл бұрын

simply excellent

@markpfeffer7487 Жыл бұрын

49:25 will hopefully age well in regards to AGI -- now that we're that much closer, thanks in part to Ilya's work. I hope he's right.

@ankk98 Жыл бұрын

Self play is really promising idea, emulating the biological evolution

@lasredchris 5 жыл бұрын

Agent = neural network - action. Environment passes back observation/reward Reach every state. Always learns something - success and failure Off policy learning

@burlemanimounika7631 4 жыл бұрын

I wish he should teach like andrew ng.. on deep learning in practical perspective..

@trylks 6 жыл бұрын

8:25 "And there is only one real true reward in life, and this is existence or non-existence, and everything else is a corollary of that." OK, that was _deep_. I would say surviving is a shared necessary condition that has many implications and that it could lead to a new era of better politics, if it got the attention it deserves. And I would not say that everything else is "a corollary", but I agree to a good extent. The video is awesome, it is just that this point may be the most important, although it is one not strongly related to machine learning.

@ChildOfTheLie96 5 жыл бұрын

Amazing stuff - this channel is great

@sabofx 6 жыл бұрын

Thanx guys! Great presentation!

@alexz5460 6 жыл бұрын

Thank you for sharing so good resources!!!!

@SachinKumar-js8yd 5 жыл бұрын

U earned my sub for this one.... Great!

@ericgonzales5057 9 ай бұрын

I wish he would have included code examples with each topic.

@lasredchris 5 жыл бұрын

I don't know what friction is what mass is - learn adaptability Infer probabilities of the simulator

@joaovitordemelo8209 8 ай бұрын

Damn it's crazy listening to 51:20 seeing what Ilya has done a few years later

@PatrickChristianMagtaan 2 ай бұрын

Nice color of the watch

@JoseloSoft 6 жыл бұрын

This is great, thanks a lot.

@labsanta Жыл бұрын

My learnings: Ilya Sutskever: OpenAI Meta-Learning and Self-Play | MIT Artificial General Intelligence (AGI) Deep Learning's Mathematical Theory Deep learning is based on the mathematical theory that if you can find the shortest program that generates data, then you can use it to make the best predictions possible. While the problem of finding the best short program is computationally intractable, it is possible to find the best small circuits using backpropagation. This fact is the basis of artificial intelligence and enables us to iteratively make small changes to the neural network until its predictions satisfy the data. 01:22 Interpreting reward from observation in reinforcement learning In reinforcement learning, the environment communicates reward and observation to the agent. However, in the real world, the agent must figure out what the reward is from the observation. There is only one true reward in life, which is existence or nonexistence. To implement reinforcement learning, a neural network is used to map observations to actions, and the learning algorithm changes the parameters based on the results of the actions. There are two classes of reinforcement learning algorithms: policy gradient and Q learning-based algorithms. 07:56 - Neural architecture search can solve small problems which can be generalized for larger problems Neural architecture search can be used to solve small problems which can be generalized for larger problems. It is a way of doing meta-learning where the architecture or learning algorithm is learned for new tasks. This helps in solving many tasks and making use of the experience in a more efficient way. 15:21 - Learning policies that quickly adapt to the real world In order to address the problem of simulating friction, a simple idea is to learn a policy that quickly adapts itself to the real world. This can be achieved by randomizing the simulator with a huge amount of variability such as friction, masses, length of objects and their dimensions. By doing so, you learn a certain degree of adaptability into the policy, which can work well when deployed on the physical robot. This is a promising technique and has a closed-loop nature of the policy. 23:07 - Self-play is an attractive approach to building intelligent systems Self-play is an approach to building intelligent systems where agents create their own environment and compete with each other to improve. It can lead to the development of better strategies and can be used to demonstrate unusual results. Self-play provides a way to create challenges that are exactly the right level of difficulty for each agent, making it an attractive approach to building intelligent systems. 32:11 - Society of agents is a plausible place where fully general intelligence will emerge If you believe that a society of agents is a plausible place where fully general intelligence will emerge and accept that our experience with the Dota BOTS we've seen a very rapid increase in competence will carry over once all the details are right, then it should follow that we should see a very rapid increase in the competence of our agents as they live in the Society of agents. 38:51 - Discovering effective strategies through imitation in game-playing bots The speaker shares an anecdote about a game-playing bot that was able to beat a human player by performing certain effective strategies. The human player then imitated one of these strategies and was able to defeat a better player. This suggests that the strategies discovered by game-playing bots are real and have real-world applications, and that fundamental game-play strategies are deeply related. The speaker also discusses the application of reinforcement learning and the importance of maximizing expected reward. Finally, the speaker considers the role of cooperation in game-playing bots and the complexity of simulation and optimization problems in artificial intelligence. - Evolutionary strategies not great for reinforcement learning The speaker believes that normal reinforcement learning algorithms are better for reinforcement learning, especially with big policies. However, if you want to evolve a small compact object like a piece of code, then evolutionary strategies could be seriously considered. Evolving a beautiful piece of code is a cool idea, but still, a lot of work needs to be done before we get there. 53:55

@huuud 6 жыл бұрын

Great talk!,thanks for posting

@mehdismaeili3743 Жыл бұрын

Excellent. thanks.

@janmejoybarman9486 Жыл бұрын

"The only real reward is existence and non-existence. Everything else is a corollary of that". explained by Bing AI: Sure, let’s break it down: When you say “real reward in life is existence or non-existence,” you’re suggesting that the most important thing in life is simply being alive (existence) or not being alive (non-existence). Everything else - like success, happiness, love, etc. - is secondary to this. In simpler terms, imagine life as a game. The biggest prize you can win in this game isn’t a high score or a bonus level, but the chance to play the game at all. That’s what you mean by “existence.” On the other hand, “non-existence” could be seen as choosing not to play the game anymore, which some might also consider a reward. Remember, this is a philosophical idea and different people may have different views on it. It’s always good to respect everyone’s perspectives. 😊

@brylevkirill 6 жыл бұрын

TD-Gammon didn't use Q-learning - it used TD(λ) with online on-policy updates.

@VincentKun Жыл бұрын

Seeing this video after gpt4 is really a thing

@matthewchunk3689 4 жыл бұрын

thank you!

@rob9756 Жыл бұрын

Дак это же определение вероятности по Колмогорову) молодец что помнишь о своих корнях

@ozguraslan5559 3 жыл бұрын

thanks for the great lecture!

@sukramapaht15 Жыл бұрын

insane vision

@markpfeffer7487 Жыл бұрын

51:37 - generative language modeling, 5 years later, larger datasets and more layers DID go far.

@AhmadM-on-Google 6 жыл бұрын

some good insight on DL from Ilya !

@danielf9110 6 жыл бұрын

This was a-m-a-z-i-n-g

@ankk98 Жыл бұрын

if agi can learn to learn and we give it a physical body to interact in our world, it will very soon breach human intelligence and will keep getting more intelligent in very very short amount of time How will we exist in coming future?

@jon_______ Жыл бұрын

51:22 audience: language models are bad. IIlya goes on to predict how hey will get better. Here we are in Jan 2023 with ChatGPT that took the world. Remind me: 2028

@constantinelinardakis8394 3 ай бұрын

4:50 uses gradient descent with calculus

@Mike-tb2hw Жыл бұрын

49:30 "I think we'll get cooperation whether we like it or not" - definitely sounds a lot eerier in 2023 lol

@WillySheepo-z6g 11 ай бұрын

They have apparently more than one graduate there. At least last time when I was part of the start-up panel in London they had plenty of bright minds.