After 5 years still, the lecture is a great video, thank you a lot
@bhargav9756 жыл бұрын
This is the best lecture I have seen on policy gradient methods. Thanks a lot.
@auggiewilliams35655 жыл бұрын
I must say that in more than 6 months, this is by far the best lecture/ material I have come across that was able to make me understand what policy gradient method actually is. I really praise this work. :) Thank you.
@jony77794 жыл бұрын
Every time I forget how policy gradients work exactly, I just come back here and watch starting at 9:30
@andreasf.39304 жыл бұрын
And every time you visited this video, you forgot where to start watching. Thats why you posted this comment. Smart guy!
@ericsteinberger41016 жыл бұрын
Amazing lecture! Love how Pieter explains the math. super easy to understand.
@ashishj23583 жыл бұрын
Best lecture on Policy Gradients hands down. Has covered some worth noting superficial details of many papers as well.
@marloncajamarca27936 жыл бұрын
Great Lecture!!!! Pieter's explanations are just a gem!
@johnnylima13376 жыл бұрын
It's such a good lecture, I'm stopping to ask myself why it was so easy to cover such significant information with full understanding
@Рамиль-ц5о4 жыл бұрын
Very good lecture about policy gradient method. I have looked through a lot of articles and was understanding almost everything, but your derivation explanation is really the best. It just opened my eyes and showed the whole picture. Thank you very much!!
@synthetic_paul4 жыл бұрын
Honestly I can’t keep up without seeing what he’s pointing at. Gotta pause and search around the screen each time he says “this over here”
@akarshrastogi36824 жыл бұрын
Exactly. "This over here" has got to be the most uttered phrase in this lecture. So frustrating.
@sharmakartikeya11 ай бұрын
I might be missing a simple concept here but how are we increasing/decreasing the grad log probability of the actions using the gradient of U(theta)? I get that positive return for a trajectory will make the gradient of U positive and so theta will be increased in favour of those trajectories but how is it increasing grad log prob?
@keqiaoli46174 жыл бұрын
why a good "R" would increase the probability of path??? Please help me
@norabelrose1982 жыл бұрын
The explanation of the derivation of policy gradient is really nice and understandable here
@bobsmithy31032 жыл бұрын
amazing work. super understandable, concise and information dense.
@ishfaqhaque19935 жыл бұрын
23:20- Gradient of expectation is expectation of gradient "under mild assumptions". What are those assumptions?
It would be nice to see what laboratories they had (what exercises)
@Procuste34iOSh4 жыл бұрын
dont know if ur still interested, but the labs are on the bootcamp website
@dustinandrews890197 жыл бұрын
I got a lot out of this lecture in particular. Thank you.
@biggeraaron5 жыл бұрын
Where can i buy his T-shirt?
@JyoPari5 жыл бұрын
Instead of having a baseline, why not make your reward function be negative for undesired scenarios and positive for good ones? Great lecture!
@JadtheProdigy6 жыл бұрын
best lecturer in series
@muratcan__224 жыл бұрын
nice but hard to follow without knowing what "this" refers to. I hope my guesses were right :)
@DhruvMetha3 жыл бұрын
Wow, this is beautiful!
@richardteubner73647 жыл бұрын
1:11 why are DQNs and friends Dynamic Programming Methods? I mean the neural network works as functions approximator to satisfy Bellmans eqn. , but still Backprop is the workhorse. In my opinion DQNs are much more similar to PG methods than to Bellman Updates??! And another issue with RL Landscape slide is where the heck are model based RL algos?? This slide should be renamed to model free RL landscape.
@faizanintech19096 жыл бұрын
Awesome instructor.
@ethanjyx5 жыл бұрын
wow damn this is so well explained and the last video is very entertaining.
@isupeene4 жыл бұрын
The guy in the background at 51:30
@suertem15 жыл бұрын
Great lecture, thanks
@elzilcho2226 жыл бұрын
could you train a robot for 2 weeks in the real world then use those trained parameters to optimize a virtual environment? You know.. making the virtual environment very close to the real world?
@OfficialYunas6 жыл бұрын
Of course you could. It's the opposite of what OpenAI does when they train a model in a virtual environment and deploy it in reality.
@soutrikband5 жыл бұрын
Real world is very complicated with model uncertainties, friction, wear and tear and what have you... Simulators can come close , but we cannot expect them to fully mimic real world phenomena.
@karthik-ex4dm6 жыл бұрын
PG is awesome!!! Doesn't depend on environment Dynamics really?? Wow All the pain and stress just goes away when we see our algorithms working😇😇
@nathanbittner83077 жыл бұрын
excellent lecture. Thank you for sharing.
@piyushjaininventor6 жыл бұрын
Can you share ppt??
@luxorska51435 жыл бұрын
You can find all the slides and the other lectures here: sites.google.com/view/deep-rl-bootcamp/lectures
@ProfessionalTycoons6 жыл бұрын
great talk!
@arpitgarg51725 жыл бұрын
If you can't explain it like Pieter Abbeel or Andrew NG then you don't understand it well enough.
@Diablothegeek7 жыл бұрын
Awesome!! Thanks
@shaz71637 жыл бұрын
very nice :)
@MarkoTintor4 жыл бұрын
... you can use "a", and the math will be the same. :)