Deep RL Bootcamp Lecture 4A: Policy Gradients

  Рет қаралды 61,483

AI Prism

AI Prism

Күн бұрын

Пікірлер: 43
@naeemajilforoushan5784
@naeemajilforoushan5784 8 ай бұрын
After 5 years still, the lecture is a great video, thank you a lot
@bhargav975
@bhargav975 6 жыл бұрын
This is the best lecture I have seen on policy gradient methods. Thanks a lot.
@auggiewilliams3565
@auggiewilliams3565 5 жыл бұрын
I must say that in more than 6 months, this is by far the best lecture/ material I have come across that was able to make me understand what policy gradient method actually is. I really praise this work. :) Thank you.
@jony7779
@jony7779 4 жыл бұрын
Every time I forget how policy gradients work exactly, I just come back here and watch starting at 9:30
@andreasf.3930
@andreasf.3930 4 жыл бұрын
And every time you visited this video, you forgot where to start watching. Thats why you posted this comment. Smart guy!
@ericsteinberger4101
@ericsteinberger4101 6 жыл бұрын
Amazing lecture! Love how Pieter explains the math. super easy to understand.
@ashishj2358
@ashishj2358 3 жыл бұрын
Best lecture on Policy Gradients hands down. Has covered some worth noting superficial details of many papers as well.
@marloncajamarca2793
@marloncajamarca2793 6 жыл бұрын
Great Lecture!!!! Pieter's explanations are just a gem!
@johnnylima1337
@johnnylima1337 6 жыл бұрын
It's such a good lecture, I'm stopping to ask myself why it was so easy to cover such significant information with full understanding
@Рамиль-ц5о
@Рамиль-ц5о 4 жыл бұрын
Very good lecture about policy gradient method. I have looked through a lot of articles and was understanding almost everything, but your derivation explanation is really the best. It just opened my eyes and showed the whole picture. Thank you very much!!
@synthetic_paul
@synthetic_paul 4 жыл бұрын
Honestly I can’t keep up without seeing what he’s pointing at. Gotta pause and search around the screen each time he says “this over here”
@akarshrastogi3682
@akarshrastogi3682 4 жыл бұрын
Exactly. "This over here" has got to be the most uttered phrase in this lecture. So frustrating.
@sharmakartikeya
@sharmakartikeya 11 ай бұрын
I might be missing a simple concept here but how are we increasing/decreasing the grad log probability of the actions using the gradient of U(theta)? I get that positive return for a trajectory will make the gradient of U positive and so theta will be increased in favour of those trajectories but how is it increasing grad log prob?
@keqiaoli4617
@keqiaoli4617 4 жыл бұрын
why a good "R" would increase the probability of path??? Please help me
@norabelrose198
@norabelrose198 2 жыл бұрын
The explanation of the derivation of policy gradient is really nice and understandable here
@bobsmithy3103
@bobsmithy3103 2 жыл бұрын
amazing work. super understandable, concise and information dense.
@ishfaqhaque1993
@ishfaqhaque1993 5 жыл бұрын
23:20- Gradient of expectation is expectation of gradient "under mild assumptions". What are those assumptions?
@joaogui1
@joaogui1 4 жыл бұрын
math.stackexchange.com/questions/12909/will-moving-differentiation-from-inside-to-outside-an-integral-change-the-resu
@emilterman6924
@emilterman6924 5 жыл бұрын
It would be nice to see what laboratories they had (what exercises)
@Procuste34iOSh
@Procuste34iOSh 4 жыл бұрын
dont know if ur still interested, but the labs are on the bootcamp website
@dustinandrews89019
@dustinandrews89019 7 жыл бұрын
I got a lot out of this lecture in particular. Thank you.
@biggeraaron
@biggeraaron 5 жыл бұрын
Where can i buy his T-shirt?
@JyoPari
@JyoPari 5 жыл бұрын
Instead of having a baseline, why not make your reward function be negative for undesired scenarios and positive for good ones? Great lecture!
@JadtheProdigy
@JadtheProdigy 6 жыл бұрын
best lecturer in series
@muratcan__22
@muratcan__22 4 жыл бұрын
nice but hard to follow without knowing what "this" refers to. I hope my guesses were right :)
@DhruvMetha
@DhruvMetha 3 жыл бұрын
Wow, this is beautiful!
@richardteubner7364
@richardteubner7364 7 жыл бұрын
1:11 why are DQNs and friends Dynamic Programming Methods? I mean the neural network works as functions approximator to satisfy Bellmans eqn. , but still Backprop is the workhorse. In my opinion DQNs are much more similar to PG methods than to Bellman Updates??! And another issue with RL Landscape slide is where the heck are model based RL algos?? This slide should be renamed to model free RL landscape.
@faizanintech1909
@faizanintech1909 6 жыл бұрын
Awesome instructor.
@ethanjyx
@ethanjyx 5 жыл бұрын
wow damn this is so well explained and the last video is very entertaining.
@isupeene
@isupeene 4 жыл бұрын
The guy in the background at 51:30
@suertem1
@suertem1 5 жыл бұрын
Great lecture, thanks
@elzilcho222
@elzilcho222 6 жыл бұрын
could you train a robot for 2 weeks in the real world then use those trained parameters to optimize a virtual environment? You know.. making the virtual environment very close to the real world?
@OfficialYunas
@OfficialYunas 6 жыл бұрын
Of course you could. It's the opposite of what OpenAI does when they train a model in a virtual environment and deploy it in reality.
@soutrikband
@soutrikband 5 жыл бұрын
Real world is very complicated with model uncertainties, friction, wear and tear and what have you... Simulators can come close , but we cannot expect them to fully mimic real world phenomena.
@karthik-ex4dm
@karthik-ex4dm 6 жыл бұрын
PG is awesome!!! Doesn't depend on environment Dynamics really?? Wow All the pain and stress just goes away when we see our algorithms working😇😇
@nathanbittner8307
@nathanbittner8307 7 жыл бұрын
excellent lecture. Thank you for sharing.
@piyushjaininventor
@piyushjaininventor 6 жыл бұрын
Can you share ppt??
@luxorska5143
@luxorska5143 5 жыл бұрын
You can find all the slides and the other lectures here: sites.google.com/view/deep-rl-bootcamp/lectures
@ProfessionalTycoons
@ProfessionalTycoons 6 жыл бұрын
great talk!
@arpitgarg5172
@arpitgarg5172 5 жыл бұрын
If you can't explain it like Pieter Abbeel or Andrew NG then you don't understand it well enough.
@Diablothegeek
@Diablothegeek 7 жыл бұрын
Awesome!! Thanks
@shaz7163
@shaz7163 7 жыл бұрын
very nice :)
@MarkoTintor
@MarkoTintor 4 жыл бұрын
... you can use "a", and the math will be the same. :)
Deep RL Bootcamp  Lecture 4B Policy Gradients Revisited
34:55
AI Prism
Рет қаралды 47 М.
VIP ACCESS
00:47
Natan por Aí
Рет қаралды 30 МЛН
Visualizing transformers and attention | Talk for TNG Big Tech Day '24
57:45
Policy Gradient Theorem Explained - Reinforcement Learning
59:36
Elliot Waite
Рет қаралды 65 М.
The Most Important Algorithm in Machine Learning
40:08
Artem Kirsanov
Рет қаралды 546 М.
MIT 6.S191: Reinforcement Learning
1:00:19
Alexander Amini
Рет қаралды 64 М.
Deep RL Bootcamp  Lecture 3: Deep Q-Networks
1:03:07
AI Prism
Рет қаралды 37 М.
This is why Deep Learning is really weird.
2:06:38
Machine Learning Street Talk
Рет қаралды 409 М.